Testing “Field Predictions” in Task Intelligence

Gene Shtilkind · ‎03-03-2023

Overview

Customers can setup and test Task Intelligence safely in production or in sub-production for both CSM and ITSM solutions.

In addition to the typical testing that customers do in their ServiceNow applications, testing Machine Learning models often requires an additional phase of assessing a model’s performance, to answer the question"how accurately does my model predict what I want it to predict”.

Depending on the type of Task Intelligence model, assessing the performance may be objective or more subjective. With “Case or incident field prediction” models, such as a model predicting the assignment group of a task, it may be easy to evaluate whether the model predicted correctly or incorrectly, by comparing the model’s prediction against the group which resolved the task. In contrast, evaluating a sentiment analysis CSM model is more subjective. What is the difference between an unsatisfied customer and a highly unsatisfied customer? It’s not always clear.

Testing in Production

Testing in production is often easier than testing in sub-production because there may be mismatches between the configurations and data in your production and sub-production environments.Task Intelligence models contain a Monitoring Mode preference setting, specifically to enable testing safely in the production environment.

Assess Your Model screen

The Assess Your Model screen allows you to evaluate the model’s performance. After training your model, the “Assess Your Model” screen will show an estimate of the model’s average performance against your most recent data. It is normal to see some variability in model performance from day to day. The performance tends to average out to the estimated performance over time.

In addition to seeing the aggregate statistics on model performance, the Assess Your Model screen also allows you to view example predictions on a sample of records. This small sample of predictions should not be used to evaluate the quality or average performance of the model. For an understanding of model quality, rely on the estimates provided on the "Assess Your Model" screen as well as the reports on the “Monitoring” screen, because these statistics are calculated from a much larger number of records.

Monitoring mode (Optional)

If you are satisfied with your model performance on the Assess Your Model screen, it is not required to do any further evaluation of performance; however, doing so can serve as a reinforcement around expected model performance.
Monitoring Mode runs predictions in the background without any impact to users of the platform. Prediction results will be visible on the “Monitoring” page in Task Intelligence.

Assessing Field-level accuracy for multi-output models in Monitoring Mode (Optional)

Task Intelligence enables customers to create a single or multiple “field prediction” models as needed, which predict multiple output fields on incident or case records. When evaluating a model with multiple output fields in monitoring mode, it can be helpful to assess the accuracy (% of correct predictions) of each field independently.

If you would like to assess the accuracy of each field after running the model in monitoring mode, follow these steps:

Go to the “ml_predictor_results_task” table by entering “ml_predictor_results_task.list” in the Application Navigator
For the Output field you are looking to assess, filter for all records with that field value in the “Predicted Output Value Name” column, for example “Predicted Output Value Name” is “product”.
Group by the “Predicted Correctly” Boolean field
Divide the number of records where “Predicted Correctly” == “True” by the total number of records. This represents your accuracy for the specific field.
Follow steps 2-4 for each field in which you would like to calculate the accuracy
If the accuracy of each field is acceptable, transition your model from monitoring mode to real-time predictions and deploy. If the accuracy of a specific field is not acceptable, you can remove that output field from your model, retrain, and deploy the model.

How to know what is “good enough”

If the model is automating part of an existing process, a useful benchmark to keep in mind is improvements over current processes. For example, if you are creating a model to automatically route tickets to the correct assignment group, you likely have data on how often tickets are being closed on first assignment in your existing process. If the model performs as well as the current process and it saves time for the organization, the choice to automate routing with Task Intelligence is easy.

What happens if the model performs somewhat worse than the current process?

A lower level of performance is often acceptable. In this situation, you will be assessing the trade-off between the additional volume of incorrect predictions and the time-savings. This decision must be evaluated on a case-by-case basis. For example, reducing the first assignment rate can have a large impact on customer satisfaction and agent productivity, so any significant decrease in performance may be unacceptable. However, there are many scenarios where a drop in the quality of the task being performed does not significantly impact business operations. In these scenarios, the time savings achieved by using a Machine Learning model may be an acceptable trade-off.

If you are not confident in deploying a model in auto-fill mode because the quality appreciably underperforms the current process, the model could, instead, be deployed in “Recommendations” mode. Recommendations save time for agents for fields where there are many options to select from and they allow less room for error since there is a human in the loop.

How do I identify the performance of the current process?

In the case of “Assignment Group”, you can expect that the final value in the assignment group field is correct. Therefore, you can use a metric like “First Assignment Resolution Rate” to identify the performance of your current process.

For fields which don’t influence routing (e.g. “Category”, “Configuration Item”, “Priority”, etc.), it should be assumed that there are a fair number of mistakes, in which the final value in a field is not the correct value. When assessing how accurately these fields are filled in, a manual audit of a couple hundred tickets can result in a fairly accurate benchmark of current performance. For example, review the “Category” set for 200 tickets and measure what % of categories were set by the agent.

Testing in Sub-production

If you intend to train and evaluate the models in subproduction, it is imperative that the data is subproduction exactly matches the data in production. If there is a mismatch between the data in subproduction and production, the results of a test in subproduction may differ significantly from the results you see once the model is deployed in production.

Synchronizing Data

Before training a model, ensure that the data in your sub-production environment matches the production environment at least for the tables against which you are making predictions. You will need to synch down all the relevant task records, records for any fields being predicted, attachments, and e-mails, as appropriate.

Task Intelligence for ITSM predicts on incident data directly, therefore this is not as much of a concern as normal clones should work just fine. When training Task Intelligence for CSM models, since they also support predictions from email generated cases with or without attachments, special handling maybe needed. Since tables like email and attachments tend to be very large in size, by design they may get omitted in normal clones.

For example, if you are predicting “Assignment group” and “Product” fields on the Customer Service Case table for e-mail-based cases with attachments, you will need to synch the following tables:

Assignment Group
Product
Customer Service Case
Attachment
E-mail

In such cases, the easiest way to ensure that data is synchronized from production to a non-production environment is to do an “Instance Restore”, overwriting your sub-production environment with the latest snapshot from your production environment. An instance restore can be performed on https://support.servicenow.com/now?id=ns_automation_store.

Note: Unlike Cloning an instance, a Restore will also carry over e-mail logs, which maybe necessary to train a Task Intelligence for CSM model that includes cases generated from emails, which makes predictions from e-mails.

Identifying field update conflicts

Note: This section only applies to “field prediction” models set to “auto-fill”.

After validating that the performance of a model is satisfactory to Go Live, it is necessary to ensure that model does not override other components which may be updating that same field.

There are multiple mechanisms to set values on a field when a task is created or updated, including business rules, client scripts, and machine learning models. When two components set values on a single field, it can result in conflicts where one component overrides another. As part of the testing, ensure that there are no situations in which different components are overriding each other by setting values on the same field.

Deploy the model in your sub-production environment.
Import incidents or cases (depending on the type of Task Intelligence model involved) from your production environment, ensuring the model runs predictions against those records.
Check the audit log (sys_audit), verifying that the fields being predicted are not being updated. If the fields show up in the audit log in your sub-production environment, where no one is making changes to the records, it indicates that there is something else in your environment updating those records. Resolve this conflict before deploying in production.

Field conflicts should be evaluated in sub-production, corresponding with standard application testing practices.

Note: As of the Utah release, the Task Intelligence Admin Console shows a warning, prior to training, if you have two machine learning models making predictions against the same field.

Troubleshooting underperforming models (Advanced)

This section applies only to models trained on your instance data. This is not applicable for pre-trained models, such as Sentiment Analysis and Language Detection.

Models with low accuracy after training

If a model exhibits low accuracy after training, there may be a number of different issues, ranging from the model configuration to underlying data quality issues. There is a lot to say on improving model performance and we will provide additional content over time to help guide customers. In the meantime, please refer to this series of articles on model tuning.

Models showing lower accuracy in production than in training

Occasionally, models may exhibit strong performance after training, on the “Assess Your Model” screen, but see a drop off in performance in production. If the decrease in performance is small, this may reflect natural day-to-day variation; however, if there is a significant, sustained drop in performance, this likely indicates that your production data does not reflect your training data.

This may occur if the model was trained on too little historical data or if the training data selected . For example, if a model predicting the correct assignment group for a record is trained on an outdated set of assignment groups, it will not be able to predict the assignment groups it has never seen.

To mitigate these issues:

Ensure that your model is trained on and between 3 months – 12 months of data.
For Case records, ensure that the channel in the training filter matches the channel in which the cases are being received. If you are running the model against e-mail-based tickets, the training filter should include “channel=email”.
Ensure the values available in the training dataset are similar to the values available in the current production environment. If there is a significant discrepancy, retrain the model.
For example, for a model predicting assignment groups:
1. Copy the query used to train the model
2. Run that query against the sn_customerservice_case or incident table depending on Task Intelligence model type
3. Group by Assignment Group and copy down the list of all assignment groups as “Trained Model Assignment Groups”
4. Remove the current filter conditions and filter for cases or incidents from the last 30 days
5. Group by Assignment Group and copy down the list of all assignment groups as “Current Assignment Groups”
6. Compare the “Trained Model Assignment Groups” and “Current Assignment Groups”. If there is a significant difference, retraining the model may resolve the issue by allowing the model to learn about the new assignment groups.

A performance drop may also occur if there has been a change in the organization, reflected in changes in production data (commonly referred to as data drift). Over time, organizations and their customers change. As these changes occur, Machine Learning models need to be retrained to keep up to date with their environment. We recommend retraining your model every 1-2 months. Retraining can be done by going to “Train Your Model” screen for your existing model in the Task Intelligence admin console and clicking “Launch Training”. After retraining, the performance of your new model can be compared to the original.

If retraining is insufficient to achieve the initial model performance estimated on the “Assess Your Model” screen, please file a case with ServiceNow Support.

Identifying and Resolving Data Quality Issues

Machine Learning models rely on high-quality data to drive accurate predictions. You may find, in certain circumstances, that your data may not be reliable and is resulting in your model achieving suboptimal performance. For example, if fields like “Priority” or “Category” are being set inconsistently in the current process, the machine learning model will struggle to learn meaningful patterns in the data and will perform poorly. In these situations, you may choose to implement a data cleanup to fix quality issues retroactively or a process change to improve data quality going forward.

Depending on the nature of the issue, it may not take long to improve your data quality by instituting process changes, from a few days to a few weeks. After the specific data quality issue has been addressed, retrain a machine learning model, and evaluate the performance. Expect to see a major improvement.

Rules of thumb

Expect fields used for routing (e.g. assignment group) to be much more accurate than fields used for reporting (e.g. category, subcategory, service offering, product).
Fields which have a smaller number of options (e.g. Priority) are more likely to be accurate than fields with a large number of options (e.g. Configuration Item).
Did you know: In Machine Learning, the number of options is referred to as the “cardinality”. Fields with more options have higher cardinality.
Fields set by end-users are likely to suffer from more data quality issues than those set by agents. For example, if you ask your customer to set the Priority of their ticket, they’re very likely to set the priority as “High” or “Critical” because they don’t have a benchmark against which to calibrate, nor are they concerned with the relative importance of their issues as compared to other customers.

Examples of data quality issues

Assignment groups with overlapping responsibilities – Sometimes, there are two assignment groups responsible for the same product or service and it can be unclear, even to a human, which group should handle the ticket. If there is an overlap in the types of records both assignment groups work on, the model may not be able to learn which group is the best group to resolve an issue.
The best way to address this issue is to revisit the ownership structure, making a clear distinction between what each group owns.
Incorrect categorization: Categorization (Category/Subcategory fields) are a common area in which we see poor data quality. Because categories don’t influence routing and agents are not aware of the cost of miscategorization, we see that categories are often incorrect/imperfect.
Missing information – If fields are optional, agents will not often populate them. If most of the tickets have empty values for the field, the model will not have enough examples from which to learn meaningful patterns.

Addressing data quality issues

Monthly audits by subject matter experts – Have SME’s, such as process owners or agent managers, review important fields for correctness. Focus especially on fields not used for routing (priority, category, product, service offering, etc.), because those tend to contain more inconsistent/incorrect values.
Agent training – Provide guidelines for filling fields, helping agents understand how the data is used and how data quality impacts other processes.
Removing fields – If the field isn’t important, don’t ask agents to spend their time filling it in. By removing fields, agents can spend more of their attention on the customer and on filling correct information for the fields which matter.
Taxonomy cleanup - Schedule a recurring taxonomy review as needed (monthly, yearly), depending on the criticality of the data.
1. Remove outdated values
2. Remove duplicates – Remove duplicate display values from reference lists or change the display value so that it clearly indicates the difference between two options. (Provide example, multiple employees named John Smith)
Mandatory fields – Fields which are optional will have significantly less data populated than mandatory fields. Making a field mandatory can increase the number of records with data populated by 10x or 100x.
Encourage feedback from agents – Give them a channel to report issues or ask questions about certain fields.
Add reference qualifiers to reduce the number of agent-visible options – Reference fields often have many options. By eliminating irrelevant options, you will both reduce the likelihood of mistakes and the amount of time agents spend searching through lists.

Rupam39 · ‎03-07-2023

Super cool!! @Gene Shtilkind

nilimadesai · ‎03-10-2023

This is really very useful information Gene!

Community Alums · ‎12-06-2023

This is great..thank you for these!

Sebastian R_ · ‎05-27-2024

@Gene Shtilkind Do you have more information on how the field values are set technically?

@Gene Shtilkind wrote: There are multiple mechanisms to set values on a field when a task is created or updated, including business rules, client scripts, and machine learning models. When two components set values on a single field, it can result in conflicts where one component overrides another

Is a machine learning model / solution definition really an own component which can trigger record updates? Is there a way to disable the "autofill" for some records (e.g. all with channel = Self-Service)?

Tanvi Sharma1 · ‎07-09-2024

Looking for some expedited response.

After Task Intelligence models are created and deployed for CSM, do I need to also configure Recommended actions for cases, for the predicted field values to show up on the case record page for CSM Configurable workspace? ServiceNow documentation around Task Intelligence for CSM, nowhere has mentioned about this as a required config, for Task Intelligence for CSM to show predicted values on case form on CSM Configurable workspace.
Therefore, asking it here as the predicted values are not showing on workspace even when the Task Intelligence model has been created and deployed.

Bhanu Sirineni · ‎07-10-2024

@Tanvi Sharma1 HI Tanvi. Glad to see BT trying out task intelligence in parallel with Now assist for CSM. Short answer is you do not need recommended actions for Task intelligence predictions to show up on CSM workspace. Can you please try these instructions? if it still doesn't work, you have my email. Please do reach out

the steps To set the Voltron pages as the default pages:

Navigate to Now Experience Framework > Experiences.
In the Title column, select CSM/FSM Configurable Workspace.
Select Open in UI Builder.
In the Name column, locate the CSM default record page and select Settings.
Enable the Active check box.
Set the Order to -1000
Select Save.

In the Name column, locate the CSM Interaction record page and select Settings.

Enable the Active check box.
Set the Order to -1000
Select Save.