Human feedback for evaluations

Zurich Enable AI

Release

zurich

ft:locale

en-US

ft:publication_title

Zurich Enable AI

ft:clusterId

platai

bundleId

platai

workflow

Platform

Human feedback for evaluations

Release version: Zurich

Updated August 13, 2025

1 minute to read

Expand the Human feedback section to see details on evaluations and their satisfaction scores.

Role required: interaction_admin

Note:

To label a record or label a conversation, you must have access to the transcript, and only users with the interaction_admin role have access to the transcript.

Human feedback section.

The Evaluations section shows all the chats that were auto-evaluated by the large language model (LLM). You have the option to manually evaluate conversations to compare the AI evaluations against your own interpretation of how the agent conversation was.


Field	Description
Number	Evaluation number assigned to each chat conversation. Select the evaluation number to see the respective chat and its evaluations.
State	State of the evaluation.
Auto Eval user satisfaction Score	Satisfaction score calculated automatically by the LLM.
Human User Satisfaction Score	Satisfaction score calculated based on the user evaluation of the chat.
Gap	Difference between the human and auto-evaluated satisfaction scores.

Manually evaluate a chat

You can manually evaluate each chat to compare it with the AI evaluation.

Select the evaluation that you want to score manually.
Toggle the View Auto Eval Scores switch on to see the AI evaluation for each category.
For each category, select your answer for how accurately you feel that the agent responded.
Toggle the Other Metrics switch on for a more detailed evaluation.
After completing, select Submit.

The Human user Satisfaction Score value gets calculated based on your response to the questions. You can see your responses for each evaluation by selecting it and then selecting View Human Scores. The Export option enables you to export the data in your preferred format.

You can also randomly label evaluations by selecting Label random scores. When selected, a list of 10 random, unevaluated conversations from the past 10 days are loaded for manual labeling.