
- Post History
- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
on 07-14-2022 11:24 PM
Testing your Natural Language Understanding (NLU) model against a set of utterances is an integral part of ensuring your model is performing optimally. The platform allows 3 primary mechanisms for testing your model during different stages of your NLU model and VA topic-building activities from within NLU Workbench and Virtual Agent Designer.
- Test single utterances using Try Model from NLU Workbench: Try model allows to test the model using single utterance samples from within Workbench. In doing so, it also provides the capability to mark the result as correct or incorrect as feedback to the model. For incorrect results, an NLU admin can also provide the correct intent from the available intents or pick no intent as the expected outcome. This feedback helps further train the model based on the provided inputs.
- Test single utterances from VA Designer: When building VA topics, we can also test utterances from within VA Designer. The NLU model does need to be published in order for the most recent model changes to factor in the predictions from within Designer. Testing from the NLU tab experience in the Designer is similar to testing the samples from within Workbench. In addition, we can also test using the Test Active Topics button in Designer, which allows us to test both NLU as well as the VA topic simultaneously.
- Batch Testing tool: Testing a large set of test utterances using
Batch Testing tool feature available as part of the Advanced NLU Workbench allows NLU Admins to test the NLU model by uploading a batch of test utterances and their expected intents to understand how the model is performing and predicting. This in turns helps to tune the model based on what the test results tell us.
Refer to this section for expert tips and tricks for using Batch Testing to tune your NLU models for optimal performance.
-
- Creating a Batch Test Set to Assess Model Performance
- Two elements: utterance and expected intent
- Quantity: at least 50 unique test utterances (ideally 100+) per intent They can be gathered from Open NLU, chat, or incident logs.
- Quality: representative of how end-users talk. The test samples should not be the exact copy of training utterances Include 15-20% of samples that should not match with any intent.
- Purpose: Test fallback behavior and adjust the confidence threshold
- Creating a Batch Test Set to Assess Model Performance
-
- Guidance for creating Test Sets from Chat Logs
-
- Identify source of test utterances:
- Post NLU go Live:
- extract end user samples from open_nlu_predict_intent_feedback table's utterance and prediction columns for a specified time period into excel format and rename the Prediction column in the excel file as 'Expected intent'
- sort the spreadsheet by Expected Intent column and use Excel's unique function to remove any duplicate samples
- finally, review the Expected Intent for the samples and correct any wrongly predicted intents to their correct value and use as the Test Set to import and run Batch Testing
- Pre NLU go Live:
- where available, use interaction.short_description or interaction_log.utterance data to gather test samples
- create a Batch Testing file with these utterances, leaving the expected intent column empty, and run Batch Testing
- validate the predictions from the detailed Batch Testing results page
- as you do this review, update the 'Expected intent' values in the Batch Testing file with your validated labels as the expected intent
- Post NLU go Live:
- Two elements: utterance and expected intent
- Quantity: at least 50 unique test utterances (ideally 100+) per intent They can be gathered from Open NLU, chat, or incident logs.
- Quality: representative of how end-users talk. The test samples should not be the exact copy of training utterances Include 15-20% of samples that should not match with any intent.
- Purpose: Test fallback behavior and adjust the confidence threshold
- Identify source of test utterances:
-
- Guidance for creating Test Sets from Chat Logs
-
- Tuning Tips from Batch Testing Results
-
- ServiceNow recommendations for model quality:
- >80% Correct; <10% Incorrect; <10% Missed
- Use the initially created test set as a “golden data set” for future use. Whenever you make changes to the model in the future, you will have a reality check test suite to ensure you don’t have any regression in your model performance
- Reviewing Batch Testing results:
- Look for overall tuning opportunities:
- Identify patterns of errors: maybe some terminology is not represented in the model
- Identify opportunities to add vocabulary
- If there are persistent errors, double-check that the model follows our best practices
- For samples that the model missed predicting:
- If this is expected, consider whether you need to support the request (or is it just gibberish?) and have a KB or catalogue item that will provide the user with the information they require
- If this is unexpected, consider adding a few representative samples into the intents where those samples belong
- For samples that the model incorrectly predicted:
- Investigate the predicted intent to see which samples might be contributing to the ambiguity
- Investigate the expected intent and make sure it too has clear and sufficient samples
- Look for overall tuning opportunities:
- ServiceNow recommendations for model quality:
-
- Tuning Tips from Batch Testing Results
Additional NLU Related Resources:
- NLU Documentation
- Now Learning – NLU Fundamentals
- Virtual Agent Academy
- Virtual Agent and NLU Quick Start Guide
- In-Depth Guide rails to building good NLU Models
- NLU FAQ, best practices, and general troubleshooting (San Diego release)
- NLU Best Practices – Using Vocabulary & Vocabulary Sources
- NLU Testing Capabilities and Techniques for your NLU Models
- Best Practices: single v. multiple NLU models
- Using NLU Model Optimize to Tune your Model
- NLU Model Optimize – FAQs
- Migrating VA and NLU between instances with update sets
- Virtual Agent and NLU Implementation
- Guided Overview to Implementing Multilingual NLU Models in NOW platform
Additional NLU troubleshooting KBs:
- 2,349 Views