ClusteringSolution - Global
The ClusteringSolution API is a scriptable object used in Predictive Intelligence stores.
This API requires the Predictive Intelligence plugin (com.glide.platform_ml) and is provided
within the sn_ml namespace.
- Create a dataset using the DatasetDefinition API.
- Use the constructor to create a clustering solution object.
- Add the solution object to the clustering solution store using the ClusteringSolutionStore - add() method.
- Train the solution using the submitTrainingJob() method. This creates a version of the object that you can manage using the ClusteringSolutionVersion API.
For usage guidelines, refer to Using ML APIs.
ClusteringSolution - ClusteringSolution(Object config)
Creates a cluster solution.
| Name | Type | Description |
|---|---|---|
| config | Object | JavaScript object containing configuration
properties of the solution. |
| config.algorithmConfig | Object | Required unless setting the encoder property. JavaScript object containing algorithm
configuration properties. Property settings vary by the value set in the algorithm
property. |
| config.algorithmConfig.algorithm | String | Method for
encoding your solution.
Valid values:
Some users prefer DBSCAN because it doesn't require you to specify the number of clusters in the data before clustering. Properties for
dbscan:
Properties for
hdbscan:
Properties for
kmeans: |
| config.algorithmConfig.distanceMetric | String | DBSCAN
algorithm only. Distance metric to scan for similar data objects.
Valid values: levenshteinDistance |
| config.algorithmConfig.epsilon | Number | DBSCAN algorithm only. Decimal value between 0 and 1 representing the size of the neighborhood search radius. |
| config.algorithmConfig.minimumNeighbours | Number | DBSCAN algorithm only. Minimum number of neighbors required in a point to be a part of a cluster. For levenshteinDistance the value must be 1 so that no points are excluded from the dataset. |
| config.algorithmConfig.minimumSamples | Number | Minimum number of data samples in a neighborhood required to determine if a
point is a core point. Default: None |
| config.algorithmConfig.targetCoverage | Number | K-means algorithm only. Percentile field to filter out records that are less similar to each other. |
| config.clusterConcept | String | Optional. Concept type. A concept is a set of words listed in descending order
of frequency. To generate a TFIDF-based cluster concept, set the value to
tfidf. Concept types are listed in the Clustering Definitions
[ml_capability_definition_clustering] table. Default: Frequency-based cluster concept |
| config.clusterConceptFieldNames | Array | Optional. List of cluster concept field names. These values are external
columns for creating a cluster concept and not used for cluster solution training.
If external columns are provided, those columns are only used for the cluster
concept and not for clustering solution training. Cluster concept fields are listed
in the Clustering Definitions [ml_capability_definition_clustering]
table. Default: Input text columns generate the cluster concept |
| config.dataset | Object | DatasetDefinition object name. |
| config.domainName | String | Optional. Domain name associated with this dataset.
Default: Current domain, for example, |
| config.encoder | Object | Required unless setting the algorithmConfig property to
"levenshteinDistance". Trained encoder object to
assign to this solution. See Encoder - Encoder(Object config). |
| config.groupByFieldName | String | Optional. Field name by which
the system groups records into one or more clusters.
In the following setup example, the system groups each type into an
individual cluster, rendering 10 clusters.
|
| config.groupUnclusteredRecords | Boolean | Flag that indicates whether to group unclustered records in results. Valid values:
Default: false |
| config.inputFieldNames | Array | List of input field names as strings. The model uses these fields used to make predictions. |
| config.label | String | Identifies the prediction task. |
| config.maxTimeWindowForUpdate | Number | Optional. Number of minutes preceding the model update point to look for records. For example, if the value is 15, the system only looks for records created in the preceding 15 minutes. By default, the system scans all records. |
| config.minRecordsPerCluster | Number | Optional. Minimum number of records to allow in any cluster. The value must be greater
than or equal to 2. Default: 2 |
| config.minRowCount | String | Optional. Minimum number of records required in
the dataset for training. Default: 10000 |
| config.processingLanguage | String | Processing language in two-letter ISO 639-1 language code format. |
| config.stopwords | Array | Optional. Preset
list of strings that the system automatically generates based on the
language property setting. For details, see Create a custom stopwords
list.
Default: English Stopwords |
| config.trainingFrequency | String | The frequency to retrain the model.
Possible values:
|
| config.updateFrequency | The frequency at which the model for
the solution definition must be rebuilt.
Possible values:
|
The following example shows how to create an object and add it to the ClusteringSolution store. The example also shows how to submit the object for training.
try{
var myData = new sn_ml.DatasetDefinition({
'tableName' : 'incident',
'fieldNames' : ['category', 'short_description', 'state', 'description'],
'encodedQuery' : 'activeANYTHING'
});
// get a trained encoder from the store
var myEncoder = sn_ml.EncoderStore.get('<encoder_name >');
var mySolution = new sn_ml.ClusteringSolution({
'label': "clustering solution",
'dataset' : myData,
'encoder' : myEncoder,
'inputFieldNames':['short_description'],
'groupByFieldName' : 'category',
'algorithmConfig' : {
'algorithm' : 'kmeans',
'targetCoverage' : '90'
}
});
// add solution
var solutionName = sn_ml.ClusteringSolutionStore.add(mySolution);
var solutionVersion = mySolution.submitTrainingJob();
var trainingStatus = solutionVersion.getStatus();
gs.print(JSON.stringify(JSON.parse(trainingStatus), null, 2));
} catch(ex){
gs.print('Exception caught: '+ ex.getMessage());
}
Output:
{
"state": "waiting_for_training",
"percentComplete": "0",
"hasJobEnded": "false"
}
The following example shows how to include the 'description' field as a cluster concept field.
var myIncidentData = new sn_ml.DatasetDefinition({
'tableName' : 'incident',
'fieldNames' : ['category', 'short_description', 'description'],
});
var encodersolutionName = sn_ml.EncoderStore.get('<encoder_name >');
var mySolution = new sn_ml.ClusteringSolution({
'label': 'clustering_test',
'dataset': myIncidentData,
'inputFieldNames': ['short_description'],
'encoder': encodersolutionName,
'clusterConceptFieldNames': ['description']
});
var solutionNameFromStore = sn_ml.ClusteringSolutionStore.add(mySolution);
var myClassifier = mySolution.submitTrainingJob();
ClusteringSolution - cancelTrainingJob()
Cancels a job for a solution object that has been submitted for training.
| Name | Type | Description |
|---|---|---|
| None |
| Type | Description |
|---|---|
| None |
The following example shows how to cancel an existing training job.
var mySolution = sn_ml.ClusteringSolutionStore.get('ml_sn_global_global_clustering');
mySolution.cancelTrainingJob();
ClusteringSolution - getActiveVersion()
Gets the active ClusteringSolutionVersion object.
| Name | Type | Description |
|---|---|---|
| None |
| Type | Description |
|---|---|
| Object | Active ClusteringSolutionVersion object. |
The following example shows how to get an active ClusteringSolution version from the store and return its training status.
var mlSolution = sn_ml.ClusteringSolutionStore.get('ml_x_snc_global_global_clustering');
gs.print(JSON.stringify(JSON.parse(mlSolution.getActiveVersion().getStatus()), null, 2));
Output:
{
"state": "solution_complete",
"percentComplete": "100",
"hasJobEnded": "true"
}
ClusteringSolution - getAllVersions()
Gets all versions of a clustering solution.
| Name | Type | Description |
|---|---|---|
| None |
| Type | Description |
|---|---|
| Array | Existing versions of a solution object. See also ClusteringSolutionVersion API. |
The following example shows how to get all ClusteringSolution version objects and call the getVersionNumber() and getStatus() solution version methods on them.
var mlSolution = sn_ml.ClusteringSolutionStore.get('ml_x_snc_global_global_clustering');
var mlSolutionVersions = mlSolution.getAllVersions();
for (i = 0; i < mlSolutionVersions.length; i++) {
gs.print("Version " + mlSolutionVersions[i].getVersionNumber() + " Status: " + mlSolutionVersions[i].getStatus() +"\n");
};
Output:
Version 3 Status: {"state":"solution_complete","percentComplete":"100","hasJobEnded":"true"}
Version 2 Status: {"state":"solution_complete","percentComplete":"100","hasJobEnded":"true"}
Version 1 Status: {"state":"solution_cancelled","percentComplete":"0","hasJobEnded":"true"}
ClusteringSolution - getLatestVersion()
Gets the latest version of a solution.
| Name | Type | Description |
|---|---|---|
| None |
| Type | Description |
|---|---|
| Object | ClusteringSolutionVersion object corresponding to the latest version of a ClusteringSolution(). |
The following example shows how to get the latest version of a solution and return its training status.
var mlSolution = sn_ml.ClusteringSolutionStore.get('ml_x_snc_global_global_clustering');
gs.print(JSON.stringify(JSON.parse(mlSolution.getLatestVersion().getStatus()), null, 2));
Output:
{
"state": "solution_complete",
"percentComplete": "100",
"hasJobEnded": "true"
}
ClusteringSolution - getName()
Gets the name of the object to use for interaction with the store.
| Name | Type | Description |
|---|---|---|
| None |
| Type | Description |
|---|---|
| String | Name of the solution object. |
The following example shows how to update ClusteringSolution dataset information and print the name of the object.
// Update solution
var myIncidentData = new sn_ml.DatasetDefinition({
'tableName' : 'incident',
'fieldNames' : ['category', 'short_description', 'priority'],
'encodedQuery' : 'activeANYTHING'
});
var eligibleFields = JSON.parse(myIncidentData.getEligibleFields('clustering'));
var myCluster = new sn_ml.ClusteringSolution({
'label': "my clustering solution",
'dataset' : myIncidentData,
'inputFieldNames': eligibleFields['eligibleInputFieldNames'],
'predictedFieldName': 'category'
});
// update solution
sn_ml.ClusteringSolutionStore.update('ml_x_snc_global_global_clustering_solution', myCluster);
// print solution name
gs.print('Solution Name: '+myCluster.getName());
Output:
Solution Name: ml_x_snc_global_global_clustering_solution
ClusteringSolution - getProperties()
Gets solution object properties.
| Name | Type | Description |
|---|---|---|
| None |
| Type | Description |
|---|---|
| Object | Contents of the Dataset and ClusteringSolution() object details in the ClusteringSolutionStore. |
| <Object>.algorithmConfig | JavaScript object containing algorithm
configuration properties. Property results vary by the value set in the algorithm property.
Data type: Object. |
| <Object>.algorithmConfig.algorithm | Method for
encoding your solution.
Properties for
dbscan:
Properties for
kmeans:
Data type: String. |
| <Object>.algorithmConfig.distanceMetric | DBSCAN
algorithm only. Distance metric to scan for similar data objects.
Data type: String. |
| <Object>.algorithmConfig.epsilon | DBSCAN
algorithm only. Decimal value between 0 and 1 representing the size of the
neighborhood search radius.
Data type: Number. |
| <Object>.algorithmConfig.minimumNeighbours | DBSCAN
algorithm only. Minimum number of neighbors required in a point to be a part of a
cluster. For levenshteinDistance the value must be 1 so that
no points are excluded from the dataset.
Data type: Number. |
| <Object>.algorithmConfig.targetCoverage | K-means
algorithm only. Percentile field to filter out records that are less similar to
each other.
Data type: Number. |
| <Object>.datasetProperties | Lists the properties of the DatasetDefinition() object associated with the solution.
Data type: Object. |
| <Object>.datasetProperties.tableName | Name of the table for the dataset. For
example, "tableName" : "Incident".
Data type: String. |
| <Object>.datasetProperties.fieldNames | List of field names from the specified table
as strings. For example, "fieldNames" : ["short_description",
"priority"].
Data type: Array. |
| <Object>.datasetProperties.fieldNames.fieldDetails | List of JavaScript objects that specify field
properties.
Data type: Array. |
| <Object>.datasetProperties.fieldNames.fieldDetails.<object>.name | Name of the field defining the type of
information to restrict this dataset to.
Data type: String. |
| <Object>.datasetProperties.fieldDetails.<object>.type | Machine-learning field type.
Data type: String. |
| <Object>.datasetProperties.fieldDetails.encodedQuery | Encoded query string in the standard platform format. See Encoded query strings. Data type: String. |
| <Object>.domainName | Domain name associated with this dataset. See Domain separation and Predictive Intelligence. Type: String |
| <Object>.groupByFieldName | Field name by which
the system groups records into one or more clusters.
Data type: String |
| <Object>.inputFieldNames | List of input field names as strings. The
model uses these fields used to make
predictions.
Data type: String. |
| <Object>.label | Identifies the prediction
task.
Data type: String. |
| <Object>.minRecordsPerCluster | Minimum number of records to allow in any cluster.
Data type: Number. |
| <Object>.name | System-assigned name.
Data type: String. |
| <Object>.predictedFieldName | Identifies a field to be trained for
predictability.
Data type: String. |
| <Object>.processingLanguage | Processing language in two-letter ISO 639-1 language code format.
Data type: String. |
| <Object>.scope | Object scope. Currently the only valid value is
global.Data type: String |
| <Object>.stopwords | Optional. Preset
list of strings that the system automatically generates based on the
language property setting. For details, see Create a custom stopwords
list.
Data type: Array. |
| <Object>.trainingFrequency | The frequency to retrain the model.
Possible values:
Data type: String. |
| <Object>.updateFrequency | The frequency at which the model for
the solution definition must be rebuilt.
Possible values:
Datatype: String |
The following example gets properties of a solution object in the store.
var myCluster = new sn_ml.ClusteringSolutionStore.get("ml_x_snc_global_global_clustering_solution");
gs.print(JSON.stringify(JSON.parse(myCluster.getProperties()), null, 2));
*** Script: {
"algorithmConfig": {
"algorithm": "kmeans",
"targetCoverage": "90"
},
"datasetProperties": {
"tableName": "incident",
"fieldNames": [
"category",
"short_description",
"state",
"description"
],
"encodedQuery": "activeANYTHING"
},
"domainName": "global",
"groupByFieldName": "category",
"inputFieldNames": [
"short_description"
],
"label": "clustering solution",
"minRecordsPerCluster": 2,
"name": "ml_x_snc_global_global_clustering_solution",
"processingLanguage": "en",
"scope": "global",
"stopwords": [
"Default English Stopwords"
],
"trainingFrequency": "run_once",
"updateFrequency": "do_not_update"
}}ClusteringSolution - getVersion(String version)
Gets a solution by provided version number.
| Name | Type | Description |
|---|---|---|
| version | String | Existing version number of a solution. |
| Type | Description |
|---|---|
| Object | Specified version of the ClusteringSolution() object on which you can call ClusteringSolutionVersion API methods. |
The following example shows how to get the training status of a solution by version number.
var mlSolution = sn_ml.ClusteringSolutionStore.get('ml_x_snc_global_global_clustering');
gs.print(JSON.stringify(JSON.parse(mlSolution.getVersion('1').getStatus()), null, 2));
Output:
{
"state": "solution_complete",
"percentComplete": "100",
"hasJobEnded": "true"
}
ClusteringSolution - setActiveVersion(String version)
Activates a specified version of a solution in the store.
| Name | Type | Description |
|---|---|---|
| version | String | Name of the ClusteringSolution() object version to activate. Activating this version deactivates any other version. |
| Type | Description |
|---|---|
| None |
The following example shows how to activate a solution version in the store.
sn_ml.ClusteringSolution.setActiveVersion("ml_incident_categorization");
ClusteringSolution - submitTrainingJob()
Submits a training job.
| Name | Type | Description |
|---|---|---|
| None |
| Type | Description |
|---|---|
| Object | ClusteringSolutionVersion object corresponding to the ClusteringSolution being trained. |
The following example shows how to create a dataset, apply it to a solution, add the solution to a store, and submit the training job.
// Create a dataset
var myData = new sn_ml.DatasetDefinition({
'tableName' : 'incident',
'fieldNames' : ['assignment_group', 'short_description', 'description'],
'encodedQuery' : 'activeANYTHING'
});
// Create a solution
var mySolution = new sn_ml.ClusteringSolution({
'label': "my solution definition",
'dataset' : myData,
'predictedFieldName' : 'assignment_group',
'inputFieldNames':['short_description']
});
// Add the solution to the store to later be able to retrieve it.
var my_unique_name = sn_ml.ClusteringSolutionStore.add(mySolution);
// Train the solution - this is a long running job
var myClusterVersion = mySolution.submitTrainingJob();