Multi-model Batch Testing
Summarize
Summary of Multi-model Batch Testing
Multi-model Batch Testing allows ServiceNow customers to evaluate multiple Natural Language Understanding (NLU) models by testing them against large sets of utterances with expected intents. This feature supports all supported NLU languages and helps assess model accuracy and intent prediction performance.
Show less
Key Features
- Test Sets: Create test sets by uploading CSV or XLSX files containing utterances and their expected intents. Test sets can include up to 10,000 rows and should reflect realistic user utterances in the same language as the models. Including utterances with no expected intent helps evaluate the model’s ability to recognize irrelevant input.
- Multi-model Testing: Run batch tests on multiple trained NLU models simultaneously to compare their performance against the same test set.
- Test Results Dashboard: View comprehensive results including summary statistics, prediction percentages, and detailed intent-level analysis. Identify top missed or incorrect intents and review individual utterance predictions with confidence scores. Export detailed results to CSV for further analysis.
- Installation: Multi-model Batch Testing is included in the NLU Workbench - Advanced Features app, which must be activated via the ServiceNow Store plugin (com.snc.nlu.workbench.advanced).
How It Helps ServiceNow Customers
This capability enables customers to systematically test and compare NLU models to ensure they perform well with real-world utterances, including handling irrelevant inputs. By identifying problematic intents and detailed utterance-level prediction issues, customers can refine and improve their models effectively, leading to better intent recognition and enhanced user experience in ServiceNow applications.
Next Steps
- Create or upload test sets with representative utterances and expected intents.
- Run multi-model batch tests to evaluate and compare models.
- Analyze test results to identify areas for model improvement.
- Iterate model training and testing to optimize intent prediction accuracy.
Test multiple Natural Language Understanding (NLU) models against a large set of utterances to evaluate the performance of the models. Add test sets, test multiple models, and see test results.
Summary usage
Use Multi-model Batch Testing to create and upload test sets comprised of utterances and their expected intents. You can then run tests against your NLU models.
Multi-model Batch Testing works with models for all supported NLU languages. See NLU language support.
Installation
Multi-model Batch Testing is part of the NLU Workbench - Advanced Features app available on the ServiceNow® Store.
To use Multi-model Batch Testing, ensure that the NLU Workbench - Advanced Features (com.snc.nlu.workbench.advanced) plugin is active on your instance. For more information, see Install NLU Workbench - Advanced Features and Activate the NLU Workbench.
Test sets
Test sets are lists of utterances and matched intents. Create a test set by using a table in a CSV or XLSX (Excel workbook) file. The table should contain two columns: one for utterances, and one for the expected intent. Your test set can include up to 10,000 rows.
To get the most out of testing your NLU models, your test sets should include utterances that the model is likely to encounter from your users. Test utterances should be in the same language as the model to be tested. The test set should also include utterances with no expected intents. Including utterances with no expected intent helps assess your model's ability to detect utterances which are irrelevant and shouldn't have any intent predicted.
To create a test set, see Create a test set.
After you have a test set, you can test trained NLU models. To begin testing, see Run a multi-model batch test.
After running a test, your results appear on the Test results page.
Test results
The Test results page lists your completed and in-progress tests. At a glance, the results page shows the models tested against, the number of utterances, and prediction percentages.
To see the details of a test result, click the name of the test set.
The Overview page shows summary information about the results and includes a graphic with a breakdown of predictions.
The Intents that need attention (Current model) shows the top 5 missed and incorrect intents. Click the intent name to drill down into the test utterances that were predicted incorrectly. Use this information to improve the model.
The Detailed results tab lists information about each utterance that was tested. From here, you can see the prediction outcome and confidence per model for each utterance. Filter the results by using the search bar or interacting with the filter tools and column headers.
You can also export the test results to a CSV file by clicking Export. The file includes the same columns as the detailed results page.
For more information on understanding your test results, see Test and publish your model.