ML Solution Definition is throwing error during training

astha_chaubey · ‎04-25-2018

Hi All,i

I activated the Agent intelligence plugin and trying to run Machine Learning on a custom table by creating ML Solution Definition records, however when click on "Update and Train" UI action it is always throwing an error saying-

Training terminated due to Exception. Executing dedup task failed. text columns can not be null or empty

Please see screenshot

My custom table contains 40000+ records and none of the fields in any record are empty.

I tried to create a new ML Trainer Definition record and tried to add that in my ML Solution Definition record, and clicked on "Update and Train", but now ML Solution record is in state "Waiting for training" endlessly and training is never getting started, not sure what the issue is.

Please help me resolve these issues.

john_hurst · ‎05-24-2018

Hi,

It is most likely failing because there are too many duplicates in your data. If the data is extremely similar, to the point where there are a countable number of values for the columns you select, Agent Intelligence will not have enough values to train with. So even though you have a enough records, if there are too many duplicates in your input columns that number doesn't matter.

As for the "Waiting for training" issue, I'm having that myself.

evgeniygilenko · ‎06-05-2018

Hi,

we are facing exactly the same issue. We have around 50k non-empty meaningful records. The error is:

Training terminated due to Exception. Executing dedup task failed. text columns can not be null or empty

As input fields we are using few Choice-Fields. Well, they cannot vary as much as free-text Short Description. But particular diversity is in place.

We tried an experiment: we have a sample of 51k Incidents, completely filled and meaningful. If we are using the "standard" categorization Short Description -> Category, the prediction works fine. As soon as we switch on several choice fields -> Category, the AI throws the same dedup error.

Does it mean that there is no way to use only choice fields as input? Do we always need a relativ long free text input fields?

The real issue I see is that we do not have access to classifier and dedup settings. We even don't know which dedup algorithms and classifiers are in place

john_hurst · ‎07-18-2018

Hi evgeniygilenko,

I realize how late this reply is, but the problem is that when Agent Intelligence only has the choice field as input there isn't really much diversity at all.

Say you have 7 choices in this field, if all the algorithm has is those 7 choices pointing to categories there is no way it will be able to predict with such a small pool of data, it's as if you only really had 7 records to create a model from.

Unless of course these records each point definitively to a category, but you would probably have just created a business rule for that.

You will almost certainly have to add more input fields to your solution definition.

Community Alums · ‎10-25-2018

Hi Astha,

I saw your enquiry earlier and then came across this KB Article in the HI Portal KB KB0691436 . It steps through a few suggestions.

You've probably moved on to other items since but thought it would be useful for other users.

The article suggests it could be one of the following issues . . .

It is advised to have a data set of at least 50k Incidents, although 100k would be an even better amount. A data set is the number of Incidents matching the selection criteria defined on the Solution Definition. If there are not enough Incidents, consider increasing the time window to select more Incidents.
The quality of the model built by the trainer relies on the quality of the data which is provided:
- The Incidents should be linked to at least 2 different values for the Output Field (usually the Category or the Assignment Group).
- There should not be any Incidents with an empty Input Field (usually the Short Description) or an empty Output Field (usually the Category or the Assignment Group).
- The values for the Input Field (usually the Short Description) should be of good quality. Incidents created artificially (by script, etc) are usually not of good quality as their short descriptions are the same or look almost the same. Real Incidents should be used.

HOpe this helps someone 🙂