Enabling evaluations
Summarize
Summary of Enabling evaluations
This feature enables continuous monitoring and evaluation of virtual agent conversations in ServiceNow, helping administrators assess and improve virtual agent performance daily. It supports random conversation evaluations to diagnose and enhance key aspects of virtual agent interactions, ensuring effective task completion and reliable user experiences.
Show less
Enabling Evaluations
- Activate Required Skills: Navigate to Admin > Now Assist Admin > Now Assist Skills > Platform and enable specific skills such as Intent Accuracy, Slot Filling, Coherence, Truthfulness, Conciseness, and others related to chat evaluation.
- Set Evaluation Limits: Configure the system property
snnaconveval.maxEvaluateCountto define the maximum number of daily conversation evaluations by updating its value under All > System Properties. - Schedule Jobs: Activate scheduled jobs filtered by the Conversation Evaluator application except the one that runs only once after installation.
- Execute Evaluation Flow: Activate the Execute Evaluation flow in Flow Designer to enable real-time evaluation on chat completion. Alternatively, use the nightly scheduled job for batch evaluations to avoid conflicts with other LLM services.
- Important Notes: Domain separation is not supported. Batch evaluations require activating the Execute Batch Evaluation flow. Custom evaluation parameters can be configured as needed.
Evaluation Dashboard vs. Conversation Insights
These two tools provide complementary views of virtual agent effectiveness:
- Evaluation Dashboard: Focuses on granular diagnostic metrics that assess virtual agent system performance in critical areas such as intent recognition, slot filling accuracy, response coherence, truthfulness, context retention, deadlock avoidance, and conciseness. It also estimates user satisfaction indirectly by correlating failures with negative experiences.
- Conversation Insights: Measures customer satisfaction and effort with inferred CSAT scores, effort scores, resolution status, frustration, confusion, transfers/escalations, empathy, and clarity of next steps. It provides a lightweight, cost-free overview of end-user perception.
Together, these tools deliver a comprehensive view of virtual agent health—from technical accuracy to customer satisfaction—supporting targeted improvements and effective management through AI Agent Analytics and AI Control Tower dashboards.
Evaluate random conversations by enabling continuous monitoring.
Role required: admin
Enable evaluations and set the number of evaluations to be performed daily.
Enable evaluations
- Activate Skills:
- Navigate to .
- Turn on the following skills:
- Intent Accuracy Chat Eval
- Inadequate Slot Filling Chat Eval
- Smoothness (Deadlock avoidance)
- Context Retention
- Coherence Chat Evaluation
- Truthfulness Hallucination Chat Eval
- Conciseness Chat Eval
- Chat topic classifier
Note:You can get the filtered list using the filter condition Conversation Evaluator under Features.
- Set the value for the system property sn_na_conv_eval.maxEvaluateCount.
- Navigate to .
- Search for and select the property sn_na_conv_eval.maxEvaluateCount.
- Update the Value field to set the maximum number of conversations to be evaluated daily.
- Select Save.
- Activate the following associated scheduled jobs:
- Navigate to .
- Apply the filter condition Application is Conversation Evaluator and filter out the job Evaluation Value Calcuation - Runs Only once after install.
- Activate all the scheduled jobs in the filtered list.
- Activate the Execute Evaluation flow in Flow Designer.Note:
By default, the Execute Evaluation flow is deactivated. You can use the nightly scheduled job, Execute Evaluations, to evaluate the chats. The nightly job won't dominate over LLM calls from other services, whereas the real-time evaluation through the Execute Evaluation flow might conflict with LLM calls from other applications.
If you want to evaluate the chats real-time on chat completion, activate the Execute Evaluation flow. Domain separation is not supported.
- Navigate to and select Flows.
- Select the Execute Evaluation flow.
- Select Edit flow.
- Select Activate.
- If you want to configure some of the evaluation parameters based on your requirements, see Configuring evaluations.
- If you want to import historical data to be evaluated, you must run batch evaluations by activating the Execute Batch Evaluation flow. For more information on the batch evaluation workflow, see .
Evaluation dashboard vs. Conversation Insights
You can use the Evaluation dashboard and the Conversation Insights (CI) application together to gain a complete picture of virtual agent effectiveness, from system performance to end-user satisfaction.
For more information about Conversation Insights, see Conversation Insights.
| Metrics captured by the Evaluation dashboard | Metrics captured by Conversation Insights |
|---|---|
|
The Evaluation dashboard provides granular diagnostic explanations that help improve virtual agent design, dialog flows, and model accuracy. It evaluates performance along dimensions critical to task success and trustworthiness. For example, "Is the system working properly and performing the expected task?"
|
Conversation Insights focuses on measuring customer satisfaction and effort. It uses inferred customer satisfaction (CSAT) and supporting signals to show how end users perceive their interaction with the virtual agent. For example "Is the end user happy with the virtual agent's performance?"
|
- Conversation Insights offers a lightweight, cost-free view of the customer experience across all conversations.
- The Evaluation dashboard delivers granular, task-focused diagnostics that enable targeted improvements to virtual agent design and performance.
- Consolidated in AI Agent Analytics and AI Control Tower dashboards, these metrics give users complementary views into virtual agent system health and end-user satisfaction.