Darshan Hiranandani : how to fix class imbalance in ServiceNow tables for Predictive Intelligence?

darshanhiranand
Giga Contributor

I’m working with the incident table in ServiceNow to train a Predictive Intelligence model, but I’m encountering issues due to class imbalance in the dataset. The records are not evenly distributed across categories, which seems to affect the model's performance.

Could you provide suggestions on how to balance the dataset records for each category type before training the model?

 

Regards

Darshan Hiranandani

4 REPLIES 4

Eoghan Sinnott
Kilo Sage
Kilo Sage

Lener Pacania1
ServiceNow Employee
ServiceNow Employee

This is an excerpt from a class I taught with PM at Knowledge 24 around Predictive/Task Intelligence Tuning:

 

If you have heavy class imbalance, what do you do?

 

If predicting the field is very important/high value for your business, there are some techniques, such as oversampling and undersampling, to create a more balanced training dataset. There are no easy mechanisms to do this within Predictive  Intelligence, but with some clever data manipulation, you can make it work.

 

One example strategy is:

1. Add a True/False custom column to your table to tag(label)the records you want to use in your training set. For example, if trying to predict assignment group create a custom called called "Balanced Assignment Group" as your custom column.

 

2. Create a script to select the records that you want to use as part of the training set and then update the column to true. Your script should identify the records you don't want to use in your training set and set those to false. The script should be designed to take an even amount of samples across the field you are trying to predict. For example: to evenly sample across 10 assignment groups you need 10,000 samples to train a model, your script will select 1,000 samples from each assignment group. Then it will set the value of "Balanced assignment group" to "True" or "False" depending on the logic in your script.


Note: Ideally, in your script, you will select the samples for each assignment group at random (as opposed to only the most recent). The more random your sample, the better your training data will match your production data.

 

3. This results in 10,000 samples which have a balanced class distribution. You can now filter for "Balanced assignment group is True" when training your model and you will have a much more balanced distribution, which will likely lead to a more accurate model.

 

Thanks for the informative excerpt. 

Lener Pacania1
ServiceNow Employee
ServiceNow Employee

I covered some ways to tackle class imbalance in a lab @Knowledge24.   See if you can use some of the techniques referenced in the PDF in this article.