Run an agentic evaluation

Xanadu Enable AI

Release

xanadu

ft:locale

en-US

ft:publication_title

Xanadu Enable AI

ft:clusterId

platai

bundleId

platai

workflow

Platform

Run an agentic evaluation

Release version: Xanadu

Updated April 3, 2025

2 minutes to read

Evaluate an agentic workflow against a dataset of your choice to monitor performance and evaluate it against different benchmarks.

Before you begin

Evaluation runs require execution log data of the agentic workflow you want to evaluate. For a new agentic workflow, you can create execution logs by testing in AI Agent Studio. For more information about testing agentic workflows, see Test an agentic workflow.

For more information about getting started with agentic evaluations, see General guidelines for agentic evaluation runs.

Role required: sn_aia.admin

Procedure

Navigate to All > Now Assist Skill Kit > Agentic Evaluations.

You can also start from the testing page of the AI Agent Studio. Navigate to All > AI Agent Studio > Testing. Select an agentic workflow and then select Set up evaluation run. A modal appears to ask if you want to be redirected to Now Assist Skill Kit. Select Open Skill Kit. You’ll be redirected to the Guided Setup.
On the evaluations home page, select New evaluation run to begin the guided setup.
In the Add general info step, add a name and select the agentic workflow that you want to evaluate.
Select Continue to go to the next step.
Each time you navigate through a step, the evaluation run is saved automatically as a draft. At any point, you can select Save as draft.

If you want to exit the guided setup, you can select Exit setup. You’re redirected to the Agentic Evaluations page.
- If you select Save and exit, the evaluation run appears in the list on the Agentic Evaluations page with the status of Draft.
- If you select Discard and exit, the evaluation run draft is deleted.
Select your evaluation method.

Overall task completeness evaluation is selected by default. Running multiple evaluation methods at a time can help provide a more comprehensive overview of the agentic workflow's performance.

To see more information about each plan, you can expand the card for each evaluation plan by selecting the chevron icon ().

Choose your dataset.

Select an existing dataset or create your own.

To create a new dataset, fill out the form.

Table 1. Choose a dataset form
Field name	Description
Name	Name of the dataset.
Description	General description of the dataset and its intended purpose.
Max records (optional)	The maximum number of records within the dataset you want to run the evaluation on. If there are more records in the dataset than the maximum number of records, any records after the maximum number of records will be ignored for that evaluation run.
Filters	Conditions for narrowing down the AI execution log records you want to include in the dataset. By default, the agentic workflow that you’re evaluating is selected as a filter condition.

Categorize Incident Agentic Evaluation dataset with no maximum records and a filter for the usecase field is categorize incident.

Select See preview to see a list of records based on the conditions you specified.
You can narrow down the records further by only selecting some of the records in the preview list. Unselected records won’t be included in the dataset.

Review the agentic evaluation details in the last step of the guided setup.

If you notice any place where you want to make changes, you can select Back to go to a previous step, or you can select the step in the sidebar.
Select Start evaluation.

Result

Your evaluation run executes. The time it takes for an evaluation run to complete varies, but once it has been complete you can select the evaluation from the Agentic Evaluations page to view the results.

For more information on the metrics on the results page, see Agentic evaluation run results.