Getting started

Zurich Enable AI

Release

zurich

ft:locale

en-US

ft:publication_title

Zurich Enable AI

ft:clusterId

platai

bundleId

platai

workflow

Platform

Getting started with agentic evaluations

Release version: Zurich

Updated March 18, 2026

3 minutes to read

Run your first agentic evaluation to assess the performance and effectiveness of your agentic AI assets using automated metrics and datasets.

Before you begin

Before you start your first evaluation run, verify that you have the following:

An agentic AI asset that has been built and saved in AI Agent Studio
At least one version of the agentic AI asset
Available dataset with a minimum number of test cases. You can create new datasets using Now Assist Data Kit.
You have the necessary permissions to run the agentic AI asset. You can verify whether you have the right permissions by checking the user access settings of the agentic AI asset in AI Agent Studio.

Role required: sn_aia.admin

About this task

Agentic evaluations help you measure the performance and reliability of your agentic AI assets before deploying them in production environments. The evaluation process runs your agentic AI asset against test datasets and measures various metrics such as overall completeness and tool use.

You can use existing datasets, create new datasets from previous execution logs, or generate new execution logs during the evaluation process. The system provides automated analysis and optimization recommendations based on the evaluation results.

Procedure

Navigate to All > Now Assist Skill Kit > Agentic Evaluations.
The Agentic Evaluations page displays a list of existing evaluation runs and their status.
Select Create new evaluation run.
The Create Evaluation Run form opens with configuration options for your evaluation.

Complete the basic configuration fields.

Table 1. Basic configuration form
Field	Description
Name	Descriptive name for the evaluation run. Use a name that identifies the purpose and scope of the evaluation.
Agentic AI Asset	Select the agentic AI asset you want to evaluate from the list of available assets in your instance.
Version	Select the specific version of the agentic AI asset to evaluate. Different versions may have different performance characteristics.
Description	Optional description providing additional context about the evaluation objectives and scope.

For example, name your evaluation run "Customer Service Agent - Version 2.1 - Performance Test" to clearly identify what you are testing.

See Version control for agentic AI for more information about maintaining different versions of your agentic AI assets.

Select the metrics to evaluate.

Select multiple metrics to get a comprehensive evaluation of your agentic AI asset's performance across different dimensions.

Configure the data for the evaluation run.

Table 2. Data configuration options
Option	Description
Use existing dataset	Select a pre-existing dataset that contains test cases and expected outcomes for evaluation.
Create dataset from execution logs	Generate a new dataset using previous execution logs of the agentic AI asset. This option evaluates the actual historical executions of the agentic AI asset.
Generate new execution logs	Create new execution logs as part of the evaluation run. The system generates test scenarios and captures the agentic AI asset's responses.

The system validates your data configuration and displays the selection of test cases that will be included in the evaluation.

Review the configuration summary and select Submit.
The configuration summary displays all selected options, including the agentic AI asset, version, metrics, and data configuration. Verify that all settings are correct before submitting.

The evaluation run starts and appears in the Agentic Evaluations list with a status of "Running".
Monitor progress and review logs when prompted.
The evaluation process may take several minutes to hours depending on the size of your dataset and complexity of your agentic AI asset. You can track progress in real-time and review execution logs as they are generated.
Review results, issues, and traces once the run completes.
After the evaluation completes, analyze the detailed results including metric scores, identified issues, and execution traces to understand your agentic AI asset's performance.
Apply recommended optimizations and trigger a re-evaluation.
Based on the evaluation results, the system may provide optimization recommendations. Apply these recommendations to improve your agentic AI asset's performance and run additional evaluations to measure improvements.

Result

You have successfully created and executed your first agentic evaluation run. The evaluation results provide insights into your agentic AI asset's performance and areas for improvement. You can use these results to optimize your agentic AI asset before deploying it in production environments.

What to do next

After completing your first evaluation, consider the following next steps:

Schedule regular evaluations to monitor performance over time
Create additional datasets to test different scenarios and use cases
Share evaluation results with AI architects and leads