NLU Model Performance
Use NLU Model Performance to see how well your models predicted intents in Virtual Agent (VA) based on end-user confirmation.
Summary usage
The NLU Model Performance application provides an analysis and report of how well your NLU models predict VA users' intents from their utterances. As NLU models for VA are updated, published, and deployed, you can use NLU Model Performance to review the efficacy of the intent predictions they make. If these predictions are skipped, it’s because they're unable to predict an intent with a high enough confidence score for the model. To improve VA topic prediction, the system groups unsupported VA utterances into clusters for an analysis and then generates a report that identifies the outcome of the analysis.
To access this application, use the nlu_admin or admin role and navigate to .
Installation
NLU Model Performance is an application available on the ServiceNow® Store. To use this application, ensure that the NLU Workbench - Advanced Features (sn_nlu_workbench) plugin is active on your instance. For more information, see Install NLU Workbench - Advanced Features and Activate the NLU Workbench. Because NLU Model Performance relies on data coming in from VA, ensure that the Glide Virtual Agent (com.glide.cs.chatbot) plugin is also active. See Activate Virtual Agent.
Configuring Conversational Interfaces settings for ServiceNow NLU
To help with tracking NLU performance, you must first configure the Conversational Interfaces (CI) settings in Virtual Agent. To access these settings, navigate to the top of the NLU Model Performance for Virtual Agent landing page and click Conversational Interfaces Settings. This action takes you to the CI settings page in Virtual Agent, as shown in the image below.
To configure the settings, you need to use the virtual_agent_admin or admin role. For configuration guidance, see Implement NLU in Virtual Agent and Enable NLU languages in Virtual Agent settings.
Reviewing the user interface (UI)
- The Performance tab and its colored chart which shows data describing scenarios that occur when users interact with VA. This chart is segmented in shades of three colors, each representing a user scenario. The rows in the chart that have text and percentages are referred to as the legend. The three scenarios in the first row of the legend have two outcome scenarios beneath them in the second row. The bars underneath the legend correspond to the colors of the scenarios above them. These bars range in size based on the percentage value for each scenario.
- For example, the dark blue bar shows you that 10% of this batch of VA users confirmed that the system presented them the correct topic to use for their VA chatbot conversation, and that 58 % of the users (represented by the dark red bar) confirmed that the topic presented to them was incorrect. Residing between these two scenarios is a scenario showing that 32% of the users (represented by the dark brown bar) didn't make any confirmation at all.
- It's a good idea to toggle the Show all bars filter back and forth at any time to see or hide the colored bars and their associated scenarios in the chart. Note also that when you first access NLU Model Performance, the default view of the chart shows only the first row of scenarios and the first row of bars.
- There are also other views you can see depending on which scenario you're viewing. For example, when you first access NLU Model Performance and you click a scenario in the first row of the legend, the system hides the bars of the other two scenarios. This helps to keep the UI less cluttered because it partially isolates the scenario you're focusing on from the rest of the scenarios. These actions don't change any data, they simply show you different views of the legend and bars within the chart.
- The Unsupported utterances tab, which shows utterances that are grouped into clusters for use in the performance analysis. This section of the UI is where you can navigate to and return from the
Expert Feedback Loop application, and also where you can run the analysis.
Unsupported utterances are different from irrelevant utterances. For more information, see Irrelevance detection in NLU.
- The Performance Details section located underneath the bar chart. This section has these four columns: Utterance, Predicted intent, Prediction model, and Prediction outcome. The details shown in this section interact with the legend data and bar data above them.
Reviewing the VA user scenarios
In the legend section of the UI, the text and percentages are accompanied by Information icons. Point to an icon to invoke the definition for its user scenario. Refer to the table below to see the definitions.
| Scenario | Definition |
|---|---|
| User confirmed as correct | The correct topic was presented to the end user and the end user has confirmed it is correct. |
| Topic launched, user confirmed as correct | The topic was launched and the user confirmed it is the topic they needed. |
| Topic menu presented, user selected one | Multiple topics were shown as a menu to the user and the user chose one of the topics to address their need. |
| User made no confirmation | The topic was launched but the user did not confirm whether or not it met their need. |
| No prediction made, fallback topic launched | No predictions were made and the fallback topic was launched. |
| Topic launched, no user confirmation | The topic was launched but the user did not confirm whether it was correct. |
| User confirmed as incorrect | The topic(s) presented did not address the end user's needs, and the user has confirmed it is not correct. |
| Topic launched, user confirmed as incorrect | The topic was launched and the user confirmed it was not correct. |
| Topic menu presented, user selected none | Multiple topics were shown as a menu to the user and the user decided none of the topics were relevant and did not choose any topic. |
Using the Language and Model filters
On the Performance tab, you can see the Language and Model filters. Next to them, you can also see the most recent Date range values you set above the legend and bars of the chart. Click the Language filter to see all languages that are available in NLU. Click the Model filter to see all prediction models that are available in your instance. The default setting for the Language filter is All languages, and the default setting for the Model filter is All models.
The Language and Model filters interact with each other. For example, if you choose
French-fr in the Language filter, the
Model filter automatically shows all prediction models in your
instance that use the French language, as shown in the following image.
- If you select a specific model from the Model filter, the Language filter value is updated to display the language of the selected model.
- If you select a specific language from the Language filter, the Model filter only displays models of that selected language.
Setting the Date range
On the Performance tab, use the Date range to define how far back you want the system to pull VA data from. Choose Past 30 days, Past 60 days, Past 90 days, or Custom range. The further back you go in the date range, the more data you will have in your analysis.
Running an analysis
To run a performance analysis, click the Unsupported utterances tab. This section of the UI shows rows of expandable clusters containing VA utterances where NLU didn't make a topic prediction, or where the VA end-user confirmed that the predicted topic was incorrect. The next thing you want to do is to click Expert Feedback Loop. This action takes you to the NLU Expert Feedback Loop application where you review and provide feedback on the utterances that were pulled in from VA.
When you leave the Expert Feedback Loop application and return to NLU Model Performance, click the caret icon in any cluster to open it. Within the cluster you can then see the top most representative VA utterances, as shown in the following image.
Click Run analysis or Rerun analysis, whichever is available. Each time you run an analysis, the system pulls the most recent VA utterances into the analysis.