Enabling evaluations

  • Release version: Zurich
  • Updated September 2, 2025
  • 3 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Enabling evaluations

    This feature enables continuous monitoring and evaluation of virtual agent conversations in ServiceNow, helping administrators assess and improve virtual agent performance daily. It supports random conversation evaluations to diagnose and enhance key aspects of virtual agent interactions, ensuring effective task completion and reliable user experiences.

    Show full answer Show less

    Enabling Evaluations

    • Activate Required Skills: Navigate to Admin > Now Assist Admin > Now Assist Skills > Platform and enable specific skills such as Intent Accuracy, Slot Filling, Coherence, Truthfulness, Conciseness, and others related to chat evaluation.
    • Set Evaluation Limits: Configure the system property snnaconveval.maxEvaluateCount to define the maximum number of daily conversation evaluations by updating its value under All > System Properties.
    • Schedule Jobs: Activate scheduled jobs filtered by the Conversation Evaluator application except the one that runs only once after installation.
    • Execute Evaluation Flow: Activate the Execute Evaluation flow in Flow Designer to enable real-time evaluation on chat completion. Alternatively, use the nightly scheduled job for batch evaluations to avoid conflicts with other LLM services.
    • Important Notes: Domain separation is not supported. Batch evaluations require activating the Execute Batch Evaluation flow. Custom evaluation parameters can be configured as needed.

    Evaluation Dashboard vs. Conversation Insights

    These two tools provide complementary views of virtual agent effectiveness:

    • Evaluation Dashboard: Focuses on granular diagnostic metrics that assess virtual agent system performance in critical areas such as intent recognition, slot filling accuracy, response coherence, truthfulness, context retention, deadlock avoidance, and conciseness. It also estimates user satisfaction indirectly by correlating failures with negative experiences.
    • Conversation Insights: Measures customer satisfaction and effort with inferred CSAT scores, effort scores, resolution status, frustration, confusion, transfers/escalations, empathy, and clarity of next steps. It provides a lightweight, cost-free overview of end-user perception.

    Together, these tools deliver a comprehensive view of virtual agent health—from technical accuracy to customer satisfaction—supporting targeted improvements and effective management through AI Agent Analytics and AI Control Tower dashboards.

    Evaluate random conversations by enabling continuous monitoring.

    Role required: admin

    Enable evaluations and set the number of evaluations to be performed daily.

    Enable evaluations

    1. Activate Skills:
      1. Navigate to Admin > Now Assist Admin > Now Assist Skills > Platform.
      2. Turn on the following skills:
        • Intent Accuracy Chat Eval
        • Inadequate Slot Filling Chat Eval
        • Smoothness (Deadlock avoidance)
        • Context Retention
        • Coherence Chat Evaluation
        • Truthfulness Hallucination Chat Eval
        • Conciseness Chat Eval
        • Chat topic classifier
        Note:
        You can get the filtered list using the filter condition Conversation Evaluator under Features.
    2. Set the value for the system property sn_na_conv_eval.maxEvaluateCount.
      1. Navigate to All > System Properties > All Properties.
      2. Search for and select the property sn_na_conv_eval.maxEvaluateCount.
      3. Update the Value field to set the maximum number of conversations to be evaluated daily.

        Update the Value field.

      4. Select Save.
    3. Activate the following associated scheduled jobs:
      1. Navigate to All > System Definition > Scheduled Jobs.
      2. Apply the filter condition Application is Conversation Evaluator and filter out the job Evaluation Value Calcuation - Runs Only once after install.

        Apply filter condition.

      3. Activate all the scheduled jobs in the filtered list.
    4. Activate the Execute Evaluation flow in Flow Designer.
      Note:

      By default, the Execute Evaluation flow is deactivated. You can use the nightly scheduled job, Execute Evaluations, to evaluate the chats. The nightly job won't dominate over LLM calls from other services, whereas the real-time evaluation through the Execute Evaluation flow might conflict with LLM calls from other applications.

      If you want to evaluate the chats real-time on chat completion, activate the Execute Evaluation flow. Domain separation is not supported.

      1. Navigate to All > Process Automation > Flow Designer and select Flows.
      2. Select the Execute Evaluation flow.
      3. Select Edit flow.
      4. Select Activate.

        Activate the Execute Evaluation flow.

    Note:
    • If you want to configure some of the evaluation parameters based on your requirements, see Configuring evaluations.
    • If you want to import historical data to be evaluated, you must run batch evaluations by activating the Execute Batch Evaluation flow. For more information on the batch evaluation workflow, see .

    Evaluation dashboard vs. Conversation Insights

    You can use the Evaluation dashboard and the Conversation Insights (CI) application together to gain a complete picture of virtual agent effectiveness, from system performance to end-user satisfaction.

    For more information about Conversation Insights, see Conversation Insights.

    Metrics captured by the Evaluation dashboard Metrics captured by Conversation Insights

    The Evaluation dashboard provides granular diagnostic explanations that help improve virtual agent design, dialog flows, and model accuracy. It evaluates performance along dimensions critical to task success and trustworthiness. For example, "Is the system working properly and performing the expected task?"

    • Intent Recognition: Whether the virtual agent correctly interprets the user's primary goal.
    • Slot Filling: Accuracy of extracting structured inputs required to complete tasks.
    • Conciseness: Avoidance of verbose or repetitive responses.
    • Coherence: Logical flow and organization of the virtual agent's responses.
    • Truthfulness: Ensures that responses are grounded in context or knowledge sources, not hallucinated.
    • Context Retention: Virtual agent's ability to remember earlier user inputs in the same session.
    • Deadlock Avoidance: Detects loops where the virtual agent gets stuck repeating questions or responses.
    • User Satisfaction (proxy): Correlates critical failures (like intent recognition or slot filling) with likely negative user perception.

    Conversation Insights focuses on measuring customer satisfaction and effort. It uses inferred customer satisfaction (CSAT) and supporting signals to show how end users perceive their interaction with the virtual agent. For example "Is the end user happy with the virtual agent's performance?"

    • Inferred CSAT: Composite score (1–5) estimating overall satisfaction with the conversation.
    • Effort Score: Measures how much work the user had to do (low/medium/high), based on transfers, escalations, repeated data collection, or wait times.
    • Resolution: Tracks whether the user's issue was fully, partially, or not resolved.
    • Frustration: Detects explicit signs of user frustration such as rude language, sarcasm, or complaints about difficulty.
    • Confusion: Identifies misunderstandings between the user and virtual agent or the live agent.
    • Transfers and Escalations: Flags when a conversation is passed to another agent, team, or supervisor.
    • Empathy: Scores how polite, personable, and supportive the virtual agent or live agent was (low/medium/high).
    • Next Steps: Measures how clearly the virtual agent or live agent explained outcomes, instructions, or follow-up actions (low/medium/high).
    Together, the Evaluation dashboard and Conversation Insights provide complementary value for virtual agent implementations.
    • Conversation Insights offers a lightweight, cost-free view of the customer experience across all conversations.
    • The Evaluation dashboard delivers granular, task-focused diagnostics that enable targeted improvements to virtual agent design and performance.
    • Consolidated in AI Agent Analytics and AI Control Tower dashboards, these metrics give users complementary views into virtual agent system health and end-user satisfaction.