DatasetDefinition - Global

Yokohama API Reference

Release

yokohama

ft:locale

en-US

ft:publication_title

Yokohama API Reference

ft:clusterId

crapiref

bundleId

crapiref

workflow

Creator

DatasetDefinition - Global

Release version: Yokohama

Updated January 30, 2025

3 minutes to read

The DatasetDefinition API provides methods to identify a set of records including a table name, columns, and row selection criteria to use as input for ML training algorithms. Datasets don't contain the actual data.

This API requires the Predictive Intelligence plugin (com.glide.platform_ml) and is provided within the sn_ml namespace. For information, see Predictive Intelligence.

Use the dataset to estimate mutual information PredictabilityEstimate or train data specified by an Encoder. You can also use the dataset to train data specified by one of the following solution types:

For usage guidelines, refer to Using ML APIs.

DatasetDefinition - DatasetDefinition(Object)

Creates an instance of the DatasetDefinition class, enabling you to define a dataset by table name, fields, and query.

Create your dataset definition by passing a table and a list of fields. You can also pass a query to restrict datasets to include rows with specific characteristics.

Once created, a DatasetDefinition object cannot be modified.

Table 1. Parameters
Name	Type	Description
config	Object	JavaScript object containing the dataset definition properties. `{ "encodedQuery": "String", "fieldDetails": [Array], "fieldNames": [Array], "tableName": "String" }`
config.tableName	String	Name of the table for the dataset. For example, `"tableName" : "Incident"`.
config.fieldNames	Array	Optional. List of field names from the specified table as strings. For example, `"fieldNames" : ["short_description", "priority"]`. Default: All fields
config.fieldDetails	Array	Optional. List of JavaScript objects that specify field properties. Use this property to force machine learning algorithms to interpret fields as being a specific type. You do not need to get field details for every field listed in the fieldNames property. All details must correspond with a field listed in the fieldNames array. `[ { "name": "String", "type": "String" } ]`
config.fieldDetails.name	String	Name of the field defining the type of information to restrict this dataset to. If used, this field name must match the corresponding name listed in the fieldNames property.
config.fieldDetails.type	String	Machine-learning field type. Specifying the data type forces the ML trainer to interpret a field as having that type. If no data type is specified, the system determines the type. Supported types: nominal: ML interprets this field as containing classes or categories. numeric: ML interprets this field as containing numbers. text: ML interprets this field as containing text. These types identify data types from a machine learning perspective. The ML type might differ from the type listed in the source table. A field can be a string type, but its purpose can be to encode a nominal value. For example, t-shirt sizes such as "XL", "L", or "M" are string types in the table, but each value represents a categorgy of a nominal attribute from an ML perspective.
config.encodedQuery	String	Optional. Encoded query string in the standard platform format. You can construct the query to be absolute or relative. For example, your query can return rows for the previous 3 months (relative), or for the May through July period (absolute). Whether using an absolute or relative pattern, the data a definition identifies can change if the rows in the underlying table change.

The following example shows how to create a dataset definition.

var myData = new sn_ml.DatasetDefinition(
  { 
     'tableName' : 'incident', 
     'fieldNames' : ['category', 'short_description', 'priority', 'assignment_group.name'],
     'fieldDetails' : [
       {
         'name' : 'category',
         'type' : 'nominal'
       },
       {
         'name' : 'short_description',
         'type' : 'text'
       }], 
     'encodedQuery' : 'sys_created_onONLast%202%20quarters@javascript:gs.beginningOfLast2Quarters()@javascript:gs.endOfLast2Quarters()^state=3'
  });

DatasetDefinition - getEligibleFields(String capability)

Returns a list of fields that are eligible as either input fields (features) or predicted fields regarding a solution of a given capability, for example, a classification solution. Eligibility is determined based on the fields having the appropriate glide data types.

Table 2. Parameters
Name	Type	Description
capability	String	Capability for which to retrieve fields eligible for training. This method currently only supports classification solutions, any other value for the capability throws a "capability not supported" exception. Valid values: `"classification"`

Table 3. Returns
Type	Description
Object	Object containing eligible input field names and eligible output field names. `{ "eligibleInputFieldNames" : [Array], "eligibleOutputFieldNames" : [Array] }`
<Object>.eligibleInputFieldNames	List of strings indicating input fields eligible for training. Data type: Array
<Object>.eligibleOutputFieldNames	List of strings indicating output fields eligible for training. Data type: Array

The following example shows how to display eligible fields for a classification solution.

var myIncidentData = new sn_ml.DatasetDefinition({
  'tableName' : 'incident',
  'encodedQuery' : 'activeANYTHING'
});

var eligibleFields = JSON.parse(myIncidentData.getEligibleFields('classification'));

gs.print(JSON.stringify(eligibleFields, null, 2));

Output:

{
  "eligibleInputFieldNames": [
    "resolved_by",
    "short_description",
    "description",
    "notify"
  ],
  "eligibleOutputFieldNames": [
    "parent",
    "caused_by",
    "location",
    "category"
  ]
}