DatasetDefinition - Global

  • Release version: Yokohama
  • Updated January 30, 2025
  • 3 minutes to read
  • The DatasetDefinition API provides methods to identify a set of records including a table name, columns, and row selection criteria to use as input for ML training algorithms. Datasets don't contain the actual data.

    This API requires the Predictive Intelligence plugin (com.glide.platform_ml) and is provided within the sn_ml namespace. For information, see Predictive Intelligence.

    Use the dataset to estimate mutual information PredictabilityEstimate or train data specified by an Encoder. You can also use the dataset to train data specified by one of the following solution types:

    For usage guidelines, refer to Using ML APIs.

    DatasetDefinition - DatasetDefinition(Object)

    Creates an instance of the DatasetDefinition class, enabling you to define a dataset by table name, fields, and query.

    Create your dataset definition by passing a table and a list of fields. You can also pass a query to restrict datasets to include rows with specific characteristics.

    Once created, a DatasetDefinition object cannot be modified.

    Table 1. Parameters
    Name Type Description
    config Object JavaScript object containing the dataset definition properties.
    {
      "encodedQuery": "String",
      "fieldDetails": [Array],
      "fieldNames": [Array],
      "tableName": "String"
    }
    config.tableName String Name of the table for the dataset. For example, "tableName" : "Incident".
    config.fieldNames Array Optional. List of field names from the specified table as strings. For example, "fieldNames" : ["short_description", "priority"].

    Default: All fields

    config.fieldDetails Array Optional. List of JavaScript objects that specify field properties.

    Use this property to force machine learning algorithms to interpret fields as being a specific type. You do not need to get field details for every field listed in the fieldNames property. All details must correspond with a field listed in the fieldNames array.

    [
      {
        "name": "String",
        "type": "String"
      }
    ]
    config.fieldDetails.name String Name of the field defining the type of information to restrict this dataset to. If used, this field name must match the corresponding name listed in the fieldNames property.
    config.fieldDetails.type String Machine-learning field type. Specifying the data type forces the ML trainer to interpret a field as having that type. If no data type is specified, the system determines the type.
    Supported types:
    • nominal: ML interprets this field as containing classes or categories.
    • numeric: ML interprets this field as containing numbers.
    • text: ML interprets this field as containing text.

    These types identify data types from a machine learning perspective. The ML type might differ from the type listed in the source table. A field can be a string type, but its purpose can be to encode a nominal value. For example, t-shirt sizes such as "XL", "L", or "M" are string types in the table, but each value represents a categorgy of a nominal attribute from an ML perspective.

    config.encodedQuery String Optional. Encoded query string in the standard platform format.

    You can construct the query to be absolute or relative. For example, your query can return rows for the previous 3 months (relative), or for the May through July period (absolute). Whether using an absolute or relative pattern, the data a definition identifies can change if the rows in the underlying table change.

    The following example shows how to create a dataset definition.

    var myData = new sn_ml.DatasetDefinition(
      { 
         'tableName' : 'incident', 
         'fieldNames' : ['category', 'short_description', 'priority', 'assignment_group.name'],
         'fieldDetails' : [
           {
             'name' : 'category',
             'type' : 'nominal'
           },
           {
             'name' : 'short_description',
             'type' : 'text'
           }], 
         'encodedQuery' : 'sys_created_onONLast%202%20quarters@javascript:gs.beginningOfLast2Quarters()@javascript:gs.endOfLast2Quarters()^state=3'
      });

    DatasetDefinition - getEligibleFields(String capability)

    Returns a list of fields that are eligible as either input fields (features) or predicted fields regarding a solution of a given capability, for example, a classification solution. Eligibility is determined based on the fields having the appropriate glide data types.

    Table 2. Parameters
    Name Type Description
    capability String Capability for which to retrieve fields eligible for training. This method currently only supports classification solutions, any other value for the capability throws a "capability not supported" exception.

    Valid values: "classification"

    Table 3. Returns
    Type Description
    Object Object containing eligible input field names and eligible output field names.
    {	 
      "eligibleInputFieldNames" : [Array],
      "eligibleOutputFieldNames" : [Array] 
    }
    <Object>.eligibleInputFieldNames List of strings indicating input fields eligible for training.

    Data type: Array

    <Object>.eligibleOutputFieldNames List of strings indicating output fields eligible for training.

    Data type: Array

    The following example shows how to display eligible fields for a classification solution.

    var myIncidentData = new sn_ml.DatasetDefinition({
      'tableName' : 'incident',
      'encodedQuery' : 'activeANYTHING'
    });
    
    var eligibleFields = JSON.parse(myIncidentData.getEligibleFields('classification'));
    
    gs.print(JSON.stringify(eligibleFields, null, 2));

    Output:

    {
      "eligibleInputFieldNames": [
        "resolved_by",
        "short_description",
        "description",
        "notify"
      ],
      "eligibleOutputFieldNames": [
        "parent",
        "caused_by",
        "location",
        "category"
      ]
    }