Minimum number of records for a Similarity model

Johan H · ‎08-28-2023

We have a use case where we want to predict skills per group for a smaller number of groups, 16 at the moment. The groups are the traditional IT groups such as Network, Server, Voice and so on.

I assume, me being a ServiceNow developer with less than 1 year of experience, that I need one similarity model per group targeting a specific set of skills. And yes, I've created one skill type per team. So, as the incidents are dispatched to the team in scope, PI predicts the skill from a subset of our skills based on short description.

Now, all groups will not have the 10 skills required to setup and train a similarity model. You can change the value through glide.platform_ml.api.min_similarity_window_records, which I'm tempted to do. 5 seams like a more reasonable number from this perspective. Less than 5 skills, then all group members should be able to cover al types ot tickets assigned to them.

But, from the more experienced developers out there, what are the down sides tweaking this number from 10 to 5? I assume less accurate predictions for the models, anything else?

Lener Pacania1 · ‎08-28-2023

Should be ok to set to 5. Similarity is a mathematical comparison vs a full on supervised ML model that would require a lot of data. Make sure your word corpus captures all the text that you are using for input, you can see an example here. If setting the min_similarity_window_records to 5 doesn't produce the desired results let us know in the forum and I'll check with engineering/support. -Lener

Johan H · ‎08-29-2023

Thank you @Lener Pacania1. I will proceed with the development and update the post if we don't get the desired results.

Johan H · ‎08-29-2023

A follow up question regarding the word corpus. Would you @Lener Pacania1 recommend us to limit it to only closed tickets assigned to the assignment group for which we are predicting the skills, or should it include all closed tickets last 6 months?

Lener Pacania1 · ‎08-29-2023

Does all the tickets in the last 6months have skills assigned to them? You'll notice when you define the model PI asks for the output field, so if your output field is skill it will need to see incidents with skills assigned to it to learn from. Is it possible for you to provide a screen shot of the table you are predicting from?