| Label |
Enter a unique name for your clustering solution. For example, in this use case you could enter Group Incidents to a Major Incident. |
| Name |
As you enter your solution Label, this field automatically populates with a system-assigned name based on your Label value. |
| Word Corpus |
If you have a legacy clustering solution, you can select a relevant word corpus from the Word Corpus field in the definition form.
Note: With the Zurich release, a word corpus is not required, because a pre-trained model is used instead. The Word Corpus field is not visible in the definition form for
pre-trained models.
For more information, see Create a word corpus.
|
| Table |
Select the table that contains record types that you want to group into one or more clusters. For example, in this use case you select the Incident [incident] table as it contains
incident records you want to group together for a major incident analysis.
When you assign a table value, a link appears in the form that shows the number of records that match your current conditions.
|
| Fields |
Select one or more input fields types that help the system identify the records you want to include in your cluster. In this use case, use Short description.
Note: When selecting a reference type field, you must dot-walk to the field's property name. For example, instead of short_description, enter
short_description.name.
|
| Use Group By |
Select this check box only if you want to group input records by a field before creating clusters.Note: Selecting this
check box activates the Group By list. If you don't select the check box, all the table records are grouped into clusters. |
| Group By |
Selecting a value from this list is optional. If you do so, the system groups records into one or more clusters based on your selection.
|
| Purity Fields |
Choose fields from your table that can help the system identify the class that is most frequent in the cluster. In this example scenario, select Category and Assignment
group.Name. |
| Filter |
Add filter conditions to apply to the input field records that you want to include in your clusters.
- The maximum number of records for clustering is limited to 300,000.
- For best results, aim for at least 2000 records as a minimum.
Note: Script includes can't be referenced from the Filter. Use database views as an alternative.
|
| Processing Language |
Select the dominant language of the dataset you're training on the solution definition. If the dataset language is Italian, choose Italian. Also, English processing is applied to all
datasets by default. For example, if you select Italian, the system processes the data in both English and Italian.Note: The term processing indicates some of the language-specific steps used as
part of training a solution. For example, tokenizing words, removing stop words, and stemming. |
| Stopwords |
When you select your processing language, the system automatically adds a Stopwords list in that language. For example, if your processing language is Italian, the Default Italian
Stopwords list appears. The Default English Stopwords list is also included. If you create a custom stopwords list, you can select it from the Stopwords field to add to
your solution. |
| Update Frequency |
Select how often you want the system to update your clusters with new and updated records. Note: The system pulls records based on the Group By filter conditions that you set on your clustering solution, if
any.
For example, if you select Every 15 minutes, the system identifies which records have arrived within that time frame. The system tries to assign them to the existing clusters, or
creates a new cluster if possible.
In this example, 20 new records arrive. If 16 of those records make it into an existing cluster and 4 don't, the system forms a new cluster for the four unassigned records.
You can also choose not to update your clusters at all.
|
| Training Frequency |
Select how often you want the system to discard all previous cluster results and recreate clusters from the beginning.
Your
options range from daily, every third day, every seven days, or monthly. You can also choose to train your cluster once.Note: The ML scheduler limits the number of trainings an instance can commit to 50 new ML
training requests per instance within a 24-hour window. The limit excludes scheduled retraining requests. In addition, clustering and similarity updates are also excluded from this limit, even if the new
training requests exceed 50 within a 24-hour window. |
| Minimum number of records per cluster |
Enter the minimum number of records you want a cluster to contain. The value you enter must be 2 or higher. |