Components installed with the Evaluation dashboard

  • Release version: Yokohama
  • Updated August 13, 2025
  • 3 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Components installed with the Evaluation dashboard

    The Evaluation dashboard in the Yokohama release includes a variety of components such as scheduled jobs, tables, system properties, business rules, flows, flow actions, and script includes. These components work together to enable efficient evaluation and analysis of virtual agent conversations, focusing on chat performance, time savings, and efficiency.

    Show full answer Show less

    Scheduled Jobs

    • CE Populate Value Aggregates Chats – Daily: Runs daily to randomly select 1000 conversations from the previous day, classifies chat durations (small, medium, large), identifies if Knowledge articles or catalog items were used, evaluates chat performance, and populates aggregated data into the Evaluation Value Aggregates table.
    • Evaluation Value Calculation - Runs Only Once After Install: Clears the Evaluation Value Aggregates table and recalculates aggregated values based on the initial evaluation date.

    Tables

    Several tables support evaluation data management:

    • Evaluation, Evaluation Configurations, Evaluation Metrics, Evaluation Set, and Evaluation Value Aggregates for storing various evaluation data.
    • Remote tables like Conversation Evaluator Value Calculations and Conversation Weekly Calculations compute and provide time savings and efficiency percentages by chat size and Knowledge article/catalog usage over selected date ranges.

    System Properties

    System properties allow customization and control of evaluation parameters:

    • Minimum records for error band calculation
    • Weights for evaluation metrics to compute composite scores
    • Maximum daily evaluation record count and total sampled conversations
    • Definitions for classifying conversations by inbound message count (small, medium, large)
    • Weights for value calculations by chat type
    • Control for rerunning value calculations post-installation

    Business Rules

    Business rules automate updates and scoring related to evaluation metrics, including:

    • Adding info messages when an Evaluation set state changes
    • Scaling and updating labeling metric scores
    • Updating deviation scores for LLM-generated metrics
    • Calculating composite scores when evaluations complete

    Flows and Flow Actions

    • Execute Evaluation flow: Evaluates conversations upon completion (default deactivated; can be activated to evaluate chats in real-time).
    • Execute Batch Evaluation flow: Performs batch evaluation of up to 100 completed conversations when an Evaluation set is created or updated.
    • Flow actions support operations like randomizing conversations, invoking LLM capabilities, classifying chats, building transcripts, and checking evaluation conditions.

    Script Includes

    • evalExecuteCondition: Updates evaluation conditions to ensure quality.
    • evalUtils: Provides utility functions used across the Evaluation processes.

    What This Enables for ServiceNow Customers

    With these components, ServiceNow customers can automate the evaluation of virtual agent conversations, gain insights into chat performance, and measure efficiency improvements, particularly when Knowledge articles or catalog items are used. The system properties and business rules allow customization to align evaluations with organizational standards. Scheduled jobs and flows provide flexibility for real-time or batch processing, ensuring scalable and timely assessments.

    Several types of components are part of the Evaluation tab, including scheduled jobs, tables, system properties, and flows.

    Scheduled jobs installed

    Scheduled job Description

    CE Populate Value Aggregates Chats – Daily

    This scheduled script runs daily and randomly selects 1000 conversations from yesterday's conversations. After that, for each conversation, this job extracts the chat duration and classifies them as small, medium, or large. It also classifies the chats in which a Knowledge article or catalog item was invoked. For the evaluated chat, it also classifies the conversations based on the chat performance and populates that data into the Evaluation Value Aggregates table.

    Evaluation Value Calcuation - Runs Only once after install Deletes all the records on the Evaluation Value Aggregates tables, runs the calculations again and stores the aggregated value in the Evaluation Value Aggregates table. The data is from the first evaluation date.

    Tables installed

    Label Name
    Evaluation

    [sn_na_conv_eval_evaluation]

    Evaluation configurations

    [sn_na_conv_eval_evaluation_configurations]

    Evaluation Metrics

    [sn_na_conv_eval_evaluation_metrics]

    Evaluation Set

    [sn_na_conv_eval_evaluation_set]

    Evaluation Value Aggregates

    [sn_na_conv_eval_evaluation_value_aggregates]

    Remote tables installed

    Table Description

    Conversation Evaluator Value Calculations

    [sn_na_conv_eval_st_value_calcs]

    For the given query, the definition for this remote table calculates the time savings and efficiency percentage for small, medium, and large chats. Also, it returns the time savings and efficiency when a Knowledge article or catalog item was invoked.

    Conversation weekly calculations

    [sn_na_conv_eval_weekly_cals]

    For the given query, the definition for this remote table calculates the time savings and efficiency percentage for small, medium, and large chats for different weeks of the selected date range. Also, it returns the time savings and efficiency when aKnowledge article or catalog item was invoked for all the different weeks of the selected date range.

    System properties installed

    Property Description

    sn_na_conv_eval.errorBandMinRecords

    Minimum number of records required to calculate the error band for upper and lower deviation. By default, the value is 30.

    sn_na_conv_eval.evalWeights

    Contains weights to each evaluation metric for chat evaluation. This property is used to compute total or composite scores for evaluation records.

    sn_na_conv_eval.maxEvaluateCount

    Maximum number of records to evaluate in a day. By default, the value is 200.

    sn_na_conv_eval.total_sampled_conv_count

    Edit this property to control the total number of conversations that can be sampled for value calculations. By default, the value is 1000.

    sn_na_conv_eval.value_chat_classifier

    Edit this property to change the definition of small, medium, and large conversations. By default, the value it stores is 4, 10.

    Here, 4 and 10 signify the total number of inbound messages. Fewer than or equal to 4 inbound messages in the sys_cs_message table for a conversation means that it’s a small conversation. More than 4 inbound messages and fewer than or equal to 10 inbound messages means that it’s a medium conversation, and more than 10 inbound messages means that it’s a large conversation.

    sn_na_conv_eval.ce_value_calculation_weights Value calculation weight values for each type of evaluated chat.
    sn_na_conv_eval.eval_value_rerun_status Reruns the value calculations once after the installation. This property will check the status of the Conversation Evaluator Value Rerun status. If it has run, then the script will change the value of this system property to false.

    Business rules installed

    Name When Insert Update Filter Conditions
    Add info message for Evaluation set after TRUE TRUE stateCHANGESTOIn Progress^evaluation_type=conversation^EQ
    Scale Up labeling metric before TRUE TRUE metric_type=Labeling^metric_nameINhelpfulness_chat_eval,intent_recognition_chat_eval,slot_filling_chat_eval,forgetfulness_chat_eval,hallucination_chat_eval,redundancy_chat_eval,deadlock_chat_eval,coherence_chat_eval^raw_scoreVALCHANGES^EQ
    updateLabelingScoresOnEvaluation after TRUE TRUE metric_type=Labeling^raw_scoreVALCHANGES^metric_nameINhelpfulness_chat_eval,intent_recognition_chat_eval,slot_filling_chat_eval,forgetfulness_chat_eval,hallucination_chat_eval,redundancy_chat_eval,deadlock_chat_eval,coherence_chat_eval^EQ
    Update deviation scores before TRUE TRUE metric_type=LLM Generated^scoreVALCHANGES^EQ
    getAutoEvalCompositeScore after FALSE TRUE stateCHANGESTOComplete^total_scoreISEMPTY^EQ

    Flows installed

    Flow Description

    Execute Evaluation

    Performs evaluations when conversations are completed.

    By default, the Execute Evaluation flow is deactivated. You can use the nightly scheduled job Execute Evaluations to evaluate the chats. If you want to evaluate the chats on chat completion, activate the Execute Evaluation flow.

    Execute Batch Evaluation Performs batch evaluations, evaluating up to 100 completed virtual agent conversations. Flow is triggered when the Evaluation set is created or updated and the Evaluation Type is Conversation.

    Flow actions installed

    Flow action Description

    Randomize conversations

    Performs randomization of conversations and returns 100 conversations randomly from a given query.

    invokeApiDefinition Invokes OneExtend Capability in the large language model (LLM).
    Chat Classifier Eval Gives the title, category, and whether the evaluation should be executed.
    buildTranscript Builds the transcript from a conversation.
    evalExecuteCondition Checks if the transcript is good enough to be evaluated.

    Script includes installed

    Script includes Description
    evalExecuteCondition

    Use this script include to update the evaluation condition.

    evalUtils Primary Utility function for the Evaluator.