Evaluation flow for batch evaluations
Batch evaluation enables Eval admins to evaluate up to 100 completed virtual agent conversations at once, based on a saved query.
Flow name: Execute Batch Evaluation.
The flow creates evaluation records and invokes Now Assist skills for each eligible conversation, mirroring the single-conversation evaluation logic, but at scale. It enforces HR scope exclusions, topic/category validation, transcript construction rules, early live-agent exclusions, and asynchronous scoring through skills.
Batch evaluations are performed using the following logic:
Trigger
- Table: Evaluation set [sn_na_conv_eval_evaluation_set]
- Condition: State changes to In Progress and Evaluation type = Conversation
Inputs
- Evaluation Set record with:
- Query filter: A query that targets conversations to be evaluated (for example, sys_cs_conversation filters).
- Evaluation type: Conversation
- State: In Progress (to start)
- LLM/Skills: Chat Topic Classifier, plus the evaluation skills listed after this.
High-level behavior
- Reads the query filter and randomly samples up to 100 conversations.
- Skips already-evaluated conversations.
- Excludes HR-scoped interactions.
- Uses Chat Topic Classifier to validate evaluation eligibility and extracts Topic and Category.
- Builds a transcript with controlled inclusion of Knowledge articles and catalog sources, and applies early live agent exclusions.
- Creates an Evaluation record and asynchronously invokes all selected evaluation skills, writing scores and rationale to metrics.
Sequence of execution:
Action 1: If the query filter isn’t empty
- Purpose: Guard clause.
- Logic: Look up the Evaluation Set record and check the query filter field.
- If the query filter is present: Proceed to Action 2.
- If empty: Stop and optionally log No query provided.
Action 2: Randomize conversations
- Purpose: Select a bounded, random sample of conversations from the provided query.
- Logic:
- Execute the query to get matching conversation records.
- Randomly select up to 100 conversations.
- If >100 matches, cap at 100.
- If <100, select all.
- Validate the query; if invalid, return false and an empty or partial array.
- Outputs:
- success: true/false
- conversation_ids: array of sys_ids (max 100)
- If success = true: Proceed to Action 3; otherwise, stop and log the validation error.
Action 3: Look up the evaluation table to check prior evaluation
- Purpose: Avoid duplicate evaluations.
- Logic: For each conversation sys_id, check sn_na_conv_eval_evaluation for existing records indicating that it's already evaluated or is in progress (implementation choice: state not in canceled/failed).
- If not previously evaluated: Proceed to Action 4 for that conversation.
- If already evaluated: Skip this conversation, optionally log Already evaluated.
Action 4: Look up the interaction record
- Purpose: Enforce HR scope exclusion.
- Logic: Resolve the interaction related to the conversation. If its application scope contains hr, skip the conversation.
- If the scope doesn’t contain hr: Proceed to Action 5.
Action 5: buildTranscript
- Purpose: Construct the final, minute-level transcript and determine downstream skill set and guardrails.
- Steps:
- Aggregate all conversation messages.
- Tag user messages as [User]: and virtual agent messages as [Virtual Agent]:.
- Knowledge articles:
- If genius results reference Knowledge articles, query the Knowledge article and replace the genius snippet with the entire article body.
- Annotate with [Virtual Agent]: Help articles for user query: and wrap content within Article_Start and Article_End.
- Constraints:
- If the KB is HR-scoped or inaccessible, don't evaluate (skip conversation).
- Truncate the article body to a maximum of 10,000 words.
- If the KB content source is attached files (PDF/Word/Txt), fall back to the genius result instead of full file content.
- Catalog Items:
- If genius results reference catalog items, query sc_cat_item and build a string: catalog name, short description, description.
- Annotate with [Virtual Agent]: Please choose one of the below options: and include citation order.
- Live Agent Exclusions:
- If the first user message requests a live agent, skip evaluation.
- If a live agent is invoked within the first 120 words, skip evaluation.
- Outputs:
- ExecuteEvaluation: true/false (post-guardrail outcome)
- Chat transcript
- Knowledge articles referred
- Catalog items referred
- First live agent occurrence: Sys_id of the conversation message (if present)
- Skills to invoke:
- Coherence Chat Evaluation
- Conciseness Chat Eval
- Context Retention
- Inadequate Slot Filling Chat Eval
- Intent Accuracy Chat Eval
- Smooth Flowing Conversation Chat Eval
- Truthfulness Hallucination Chat Eval
- Additional logs
- If ExecuteEvaluation = true: Proceed to Action 7; otherwise, skip the conversation.
Action 6: If Block
- Purpose: Branch to record creation.
- Logic: If ExecuteEvaluation from Action 6 is true, go to Action 8.
Action 7: Chat Classifier Eval
- Purpose: Validate whether the conversation should be evaluated and extract high-level labels.
- Logic:
- Build a lightweight transcript from sys_cs_message for classification input.
- Invoke Chat topic classifier skill with the transcript.
- Receive:
- Execute evaluation: true/false
- Topic Name
- Category: IT or HR
- If Execute evaluation = true: Proceed to Action 6.
- If false: Skip conversation and log the classifier decision.
Action 8: Create or Update evaluation record
- Purpose: Persist an evaluation entry for this conversation.
- Table: sn_na_conv_eval_evaluation
- Field population:
- Document conversation: Conversation reference
- State: processing
- Topic: from Action 5
- Category: from Action 5
- KB Referred: from Action 6
- Catalog Referred: from Action 6
- First live agent occurrence: from Action 6
- Type: chat summarization
- User: initiating user for the conversation
- Message log: Additional logs from Action 6
- On success: Proceed to Action 9.
Action 9: For Loop over skills
- Purpose: Execute each selected evaluation skill.
- For each skill in the list from Action 6:
- Action 10: invokeApiDefinition
- Inputs: Skill Name, Conversation, Transcript, Evaluation Id
- Behavior:
- Invoke the Now Assist skill asynchronously.
- The post-processor writes results into sys_generative_ai_response_validator.
- Extract JSON response fields:
- Score
- Reason for Score
- Examples supporting the reasoning
- Create child metric records in sn_na_conv_eval_evaluation_metrics linked to the parent evaluation.
- Action 11: Wait
Pause seven seconds before proceeding to the next skill to manage rate limits or throttling.
- Action 10: invokeApiDefinition