Test set creation and management

  • Release version: Xanadu
  • Updated August 1, 2024
  • 4 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Test set creation and management

    ServiceNow automatically creates a default test set for each NLU model used in Virtual Agent or AI Search to help evaluate model performance and accuracy. This test set starts empty and can be built and managed over time within the NLU Workbench. To fully utilize the test set capabilities, customers must install theNLU Workbench - Advanced Featuresapplication from the ServiceNow Store.

    Show full answer Show less

    Accessing the Default Test Set

    Customers can access their model’s default test set through several navigation paths in the ServiceNow interface, including:

    • NLU Workbench > Models, selecting the model’s application tab and then the model name, followed by the Build and Train card’s View phase and the Test set tab.
    • NLU Workbench > Models, then the Test Coverage tile on the model overview page.
    • Multi-model Batch Testing > Test sets tab, locating the model by name and identifying default test sets labeled as “Default.”

    Building and Managing the Test Set

    Test utterances and their expected intents can be added manually or imported from CSV files or other models. The system records the source of test utterances as Manual when added this way. Additionally, the Expert Feedback Loop feature allows importing actual user utterances from Virtual Agent chat logs, marked with the source “Expert Feedback.” This ongoing management ensures the test set remains relevant and comprehensive.

    Test Coverage and Quality

    Test Coverage measures the percentage of enabled intents in the model that have corresponding test utterances. For reliable performance testing, coverage should be at least 60%, with a minimum of five test utterances per intent. A higher coverage leads to more accurate confidence thresholds during batch testing. It is recommended to include about 10% of test utterances marked as "not relevant" to evaluate the model’s handling of irrelevant inputs effectively.

    Using the Test Set

    The default test set can be used directly during the Test and publish your model phase or within Multi-model Batch Testing to assess performance across multiple models.

    Characteristics and Maintenance

    • Upon instance upgrades, default test sets are created for existing models lacking them.
    • Duplicating a model copies its default test set to the new model.
    • Test set utterances must differ from training data and be in the same language as the model.
    • Default test sets cannot be deleted independently of their associated model.
    • They are available only for Virtual Agent or AI Search models.

    Downloading and Moving Test Sets

    Customers can download default test sets as CSV files, which include test utterances and expected intents but exclude source information. Test sets can also be moved between instances using update sets, which transfer all related data including test utterances, intents, and sources. Note that exporting a model as CSV does not include the default test set.

    Use the default test set of your NLU model to test the model's performance and accuracy. Manage your test set over time by building or updating its content in the NLU Workbench.

    Note:
    To test your model, install the ServiceNow® Store application NLU Workbench - Advanced Features. For more information, see Install NLU Workbench - Advanced Features.
    When you create an NLU model for Virtual Agent or AI Search, a default test set is created and associated to the model. You can use the default test set to evaluate the model’s performance. Initially the test set is empty, ready to be populated with your content.

    Access your default test set

    Access your default test set with one of the following methods.
    • Navigate to All > NLU Workbench > Models. Select the tab for your model's application, then the name of your model from the list. On the model's overview page, find the Build and Train your model card and select its View phase button. Then select the Test set tab. On the NLU model overview page, the Test set tab is highlighted.
    • Navigate to All > NLU Workbench > Models. Select the tab for your model's application, then the name of your model from the list. On the model's overview page, select the Test Coverage tile. On the model overview page, the Test Coverage tile is highlighted.
    • Navigate to All > Multi-model Batch Testing > Test sets tab. Find the name of your model. Default test sets are labeled as Default.An entry in Multi-model Batch Testing's Test sets table.

    Add content to your default test set

    Add utterances and their expected intents to build and manage your test set over time. You can add content to the default test set with the following methods:

    • Add test utterances and their expected intents manually. From the model's overview page navigate to Build and train your model > Test set tab. Type your input into the Type a test utterance here field, select an appropriate intent, then select the Add button.

      These test utterances are assigned a source of Manual.

    • Import test utterances and their expected intents from a CSV file or from other models. To import content to a default test set, from the model's overview page navigate to Build and train your model > Test set tab. Select Import test utterances.

      Imported test utterances are assigned a source of Manual.

    • The Expert Feedback Loop feature lets you add actual user utterances from Virtual Agent chat logs to the test set.

      These test utterances are assigned a source of Expert Feedback. For more information, see NLU Expert Feedback Loop.

    Test Coverage

    The Test Coverage score is the percentage of a model's enabled intents that have test utterances in the default test set. Before testing your model, ensure that there is at least 60% coverage. The higher the Test Coverage score, the more accurate the performance testing results.

    Your test coverage needs to be at least 60%, with at least 5 test utterances per intent, in order for the system to provide an optimal confidence threshold during batch testing. For more information about the confidence threshold, see NLU model settings.

    Aim to have about 10 percent of a model's test utterances marked as "not relevant", meaning that there is no intent associated. This helps assess how the model handles irrelevant utterances which should not have any intent predicted. For more information about irrelevant utterances, see Irrelevance detection in NLU.

    Use the test set

    To use the default test set from the Test and publish your model phase, see Test and publish your model.

    To use the test set in Multi-model Batch Testing, see Multi-model Batch Testing.

    Characteristics of default test sets

    When an instance is upgraded, default test sets are created for any existing models that don't already have them.

    When you copy a model using Duplicate this model, the original's default test set is copied into the new model. For more information, see Duplicate an NLU model.

    The utterances in the test set shouldn't be the same as the utterances in the training set.

    Default test sets can't be deleted separately from their models.

    Test set utterances should be in the same language as their model.

    Test sets are available for Virtual Agent or AI Search models.

    Downloading or moving default test sets

    Default test sets can be downloaded or moved as follows.

    • Default test sets can be separately downloaded in CSV format. To download the test set, from the model's overview page navigate to Build and train your model > Test set tab. Select Download test set.
      Note:
      Test sets that are downloaded from Download test set contain test utterances and their expected intents, but not the sources.
    • Default test sets can be moved with update sets. When you add an NLU model to an update set, its default test set is added, including test utterances, expected intents, and sources. For more information, see Add an NLU model to an update set.
    • When using the Export model as CSV function in the All existing models table, the default test set is not included. For more information, see Export an NLU model.