Evaluate a prompt
Use the Now Assist Skill Kit evaluation tools to evaluate the effectiveness of your skill prompts.
Before you begin
Role required: sn_skill_builder.admin
Procedure
- Navigate to All > Now Assist Skill Kit > Home.
- Select the skill that you want to evaluate.
- Select the Prompt performance tab.
- Select the Evaluation runs tab.
-
Create a dataset from a table or data collection.
Table 1. Create a data set Method Steps Create a dataset from a table - Give the dataset a name and description.
- Select Table.
- Find the table that you want to use.
- Select the maximum number of records that you want to use.
- Add conditions.
- Select Generate Preview.
- Select the mappings.
- Select Create.
Create a dataset from a data collection - Give the dataset a name and description.
- Select Data Collection.
- Select a data collection that you created in Now Assist Data Kit.
- Select Generate Preview.
- Select the mappings.
- Select Create.
-
Select the add icon
for Evaluation Runs.
- Give the evaluation run a name and description.
- Select one or more prompts that you want to evaluate.
- Select Save & Next.
- Select a dataset.
- Select Save & Next.
- Expand the Quality tab.
-
Select the metrics that you want to evaluate.
Table 2. Evaluation metrics Evaluation method Metric Description Human Human Feedback Human evaluation is the default option available for all prompt executions that generate a response. You can rate the response with a thumbs up or thumbs down, based on your satisfaction. You also have the option to provide more detailed feedback to explain your evaluation choice. Automated Correctness The correctness metric assesses the generated response's accuracy, completeness, pertinence, and writing quality relative to the given instruction. This metric helps to check that the text accurately reflects the instruction, covers all important points, remains relevant, and is well written. Automated Correctness with Golden Response The correctness with golden response metric uses a predefined reference to assess the generated response's accuracy, completeness, pertinence, and writing quality relative to the given instruction. This metric helps to check that the text accurately reflects the instruction, covers all important points, remains relevant, and is well written. You should use this metric whenever possible. Automated Faithfulness The faithfulness metric assesses whether a generated response accurately reflects the information and context provided in the given instruction. This metric helps to check that the text contains no hallucinations, fabricated facts, or unsupported conclusions, maintaining alignment with the source material. - Select Save & Next.
- Review the evaluation choices that you made.
- Select Save & Evaluate.
- Optional:
Give a human evaluation.
- Select Human evaluation.
- Select a record to use in the evaluation.
- Expand the prompt and read the result.
-
Select the thumbs up or thumbs down icon
to give your evaluation.
- Add more information and select Submit.