General guidelines for agentic evaluation runs

Xanadu Enable AI

Release

xanadu

ft:locale

en-US

ft:publication_title

Xanadu Enable AI

ft:clusterId

platai

bundleId

platai

workflow

Platform

General guidelines for agentic evaluation runs

Release version: Xanadu

Updated April 8, 2025

1 minute to read

Learn about agentic evaluation runs and different recommendations for evaluating your agentic workflows against datasets to check for completion, performance, and tool execution.

Overview of agentic evaluation runs

Evaluation runs for agentic workflows evaluate agentic workflow executions for different metrics, such as task completion, performance, and tool execution. You can create datasets using logs for agentic workflows.

When to run agentic evaluations

Run after you have collected enough data.: Evaluation runs are measured against logs of agentic workflow activity on your instance.
Run agentic evaluations when you make significant changes.: After making updates to the agentic workflow, you can execute an agentic evaluation run to track the efficacy of the new version.

Choosing an evaluation method

Review the evaluation method options.: The agentic evaluation Guided Setup provides information about each evaluation method, including what they’re measuring and how they work. You can also review the common questions in the sidebar for answers about the available metrics.
Use multiple evaluation methods at a time.: Choosing multiple evaluation methods can provide a better overall picture of the agentic workflow's performance.

Creating a dataset

Use filters to target the right data.: Add filters to the execution logs to control exactly what you're measuring your agentic workflow against. Filter different time frames to verify that you're measuring the latest version of a workflow. You can select See preview to see a list of records. You can also use the check boxes to select individual records to measure against.