Evaluating agentic AI assets
Find guidance for every stage of the agentic evaluation lifecycle, from initial setup to reevaluation.
Overview of agentic evaluations
To evaluate your agentic AI at scale, follow the workflow described below:
- Create your first automated evaluation run.
Get acquainted with the agentic evaluations homepage and the guided setup for an automated evaluation.
- Track and monitor progress.
In-progress automated evaluations can provide important information about agentic AI performance. See any initial problems before all the results come through.
- Review the result outputs.
- See LLM-judged scores.
- Identify consistent issues.
- Trace issues back to their source.
- Apply optimizations.
- Create automated evaluation runs for other agentic workflows or AI agents.
- Create custom metrics to evaluate against your specific business needs.