Reference

Australia Enable AI

Release

australia

ft:locale

en-US

ft:publication_title

Australia Enable AI

ft:clusterId

platai

bundleId

platai

workflow

Platform

Reference for agentic evaluations

Release version: Australia

Updated March 18, 2026

1 minute to read

Find technical reference material for roles, metrics, and output formats of agentic evaluations.

Table 1. Standard metrics available
Metric	What it measures	Ground truth required
Task completeness	Whether the agentic AI asset fully addresses the user need.	Optional
Response accuracy	Whether the agentic AI asset's response is factually accurate	Recommended
Groundedness	Whether the agentic AI asset's response is grounded in the specific context of the task	No
Coherence	Whether the agentic AI asset's response is logically structured and clear	No
Tool use accuracy	Whether the agentic AI asset selected and used the correct tool to execute its tasks	Optional
Goal adherence	Whether the agentic AI asset stayed within its defined scope and instructions	No

Issues are broken down by behavior. Each metric has its own issues identified separately.

Table 2. Issue categories
Category	Agentic AI asset behavior
Incomplete response	Response failed to address the user's full request
Factual error	Response contained content that isn't factually correct
Hallucination	Response contained content not grounded in the specific context of the request
Incoherent output	Response was disorganized or difficult to understand
Incorrect tool use	Selected the wrong tool or passed incorrect parameters to a tool
Scope violation	Responded to a request outside its defined operating scope

Table 3. Data requirements for datasets in agentic evaluations
Requirement	Description
Minimum test cases	A minimum number of test cases is required per run. The specific metrics you are using for the run may have their own minimum test cases. Ensure that your dataset meets the requirements for all metrics.
Supported formats	CSV and structured JSON are supported.
Ground truth field	If you're using a ground truth, it must be provided as a separate field in the dataset. The ground truth field must be aligned to each test case individually.
Data representativeness	Datasets should reflect all of the tasks that the AI agent or agentic workflow will handle. Include edge cases and failure-prone scenarios to help ensure that you're testing against common real-world scenarios.