ClusteringSolution - グローバル

Xanadu API リファレンス

Release

xanadu

ft:locale

ja-JP

ft:publication_title

Xanadu API リファレンス

ft:clusterId

crapiref

bundleId

crapiref

workflow

Creator

ClusteringSolution - グローバル

リリースバージョン: Xanadu

更新日 2024年08月01日

所要時間：31分

ClusteringSolution API は、予測インテリジェンスストアで使用されるスクリプト可能なオブジェクトです。

この API には予測インテリジェンスプラグイン (com.glide.platform_ml) が必要です。この API は sn_ml 名前空間内で提供されます。

ソリューションのセットアップからトレーニングまでのフローは次のとおりです。

DatasetDefinition API を使用してデータセットを作成します。
K 平均法クラスタリングアルゴリズムを使用する場合は必須です。Encoder API を使用してエンコーダーを構築します。
コンストラクタ―を使用して、クラスタリングソリューションオブジェクトを作成します。
ClusteringSolutionStore - add() メソッドを使用して、ソリューションオブジェクトをクラスタリングソリューションストアに追加します。
submitTrainingJob() メソッドを使用してソリューションをトレーニングします。これにより、ClusteringSolutionVersion API を使用して管理できるオブジェクトのバージョンが作成されます。
ClusteringSolutionVersion – predict() メソッドを使用して予測を取得します。

注:

この API は全権限で実行します。ユーザーのアクセスを制限するには、スクリプトにアクセス制御メカニズムを含めます。

使用上のガイドラインについては、「 ML API の使用」を参照してください。

ClusteringSolution - ClusteringSolution(オブジェクト config)

クラスターソリューションを作成します。

表 : 1. パラメーター
名前	タイプ	説明
config	オブジェクト	の構成プロパティを含む JavaScript オブジェクトソリューション。 `{ "algorithmConfig": {Object}, "clusterConcept": "String", "clusterConceptFieldNames": [Array], "dataset": {Object}, "domainName": "String", "encoder": {Object}, "groupByFieldName": "String", "groupUnclusteredRecords": Boolean, "inputFieldNames": [Array], "label": "String", "maxTimeWindowForUpdate" : Number, "minRecordsPerCluster" : Number, "minRowCount": "String", "processingLanguage": "String", "stopwords": [Array], "trainingFrequency": "String", "updateFrequency": "String" }`
config.algorithmConfig	オブジェクト	encoder プロパティを設定する場合を除き、必須です。アルゴリズム設定プロパティを含む JavaScript オブジェクト。プロパティの設定は、algorithm プロパティに設定された値によって異なります。 `'algorithmConfig': { "algorithm": "String", // See algorithmConfig.algorithm setting description for property settings based on algorithm }`
config.algorithmConfig.algorithm	文字列	ソリューションをエンコードする方法。有効な値： dbscan：Density-Based Spatial Clustering of Applications with Noise (DBSCAN、ノイズのあるアプリケーションの密度ベース空間クラスタリング) クラスタリングアルゴリズム。このアルゴリズムで使用されるプロパティ： distanceMetric epsilon minimumNeighbours hdbscan：Hierarchical Density Based Spatial Clustering of Applications with Noise (HDBSCAN、ノイズのあるアプリケーションの階層密度ベース空間クラスタリング) クラスタリングアルゴリズム。このアルゴリズムで使用されるプロパティ： minimumSamples kmeans：K 平均法クラスタリングアルゴリズム。デフォルトです。このアルゴリズムでは targetCoverage プロパティが使用されます。 DBSCAN の使用を好むユーザーもいます。クラスタリングの前にデータ内のクラスター数を指定する必要がないためです。 dbscan のプロパティ： `'algorithmConfig': { "algorithm": "dbscan", "distanceMetric": "String", "epsilon": Number, "minimumNeighbours": Number }` hdbscan のプロパティ： `'algorithmConfig': { "algorithm": "hdbscan", "minimumSamples": Number }` kmeans のプロパティ： `'algorithmConfig': { "algorithm": "kmeans", "targetCoverage": Number }`
config.algorithmConfig.distanceMetric	文字列	DBSCAN アルゴリズムのみ。類似のデータオブジェクトをスキャンする距離測定基準。有効な値：levenshteinDistance
config.algorithmConfig.epsilon	番号	DBSCAN アルゴリズムのみ。近接検索半径のサイズを表す 0 と 1 の間の小数値。
config.algorithmConfig.minimumNeighbours	番号	DBSCAN アルゴリズムのみ。ポイントがクラスターの一部となるために必要な近接の最小数。levenshteinDistance の場合、データセットから除外されるポイントがないように、値を 1 にする必要があります。
config.algorithmConfig.minimumSamples	番号	ポイントがコアポイントであるかどうかを判断するために必要な、近隣のデータサンプルの最小数。デフォルト：なし
config.algorithmConfig.targetCoverage	番号	K 平均法アルゴリズムのみ。互いにあまり類似していないレコードを除外するためのパーセンタイルフィールド。
config.clusterConcept	文字列	オプション。概念タイプ。概念は、頻度の高い順にリストされた一連の単語です。TFIDF ベースのクラスターの概念を生成するには、値を `tfidf`に設定します。概念タイプは、クラスタリング定義 [ml_capability_definition_clustering] テーブルに一覧表示されます。デフォルト：頻度ベースのクラスターの概念
config.clusterConceptFieldNames	アレイ	オプション。クラスターの概念フィールド名のリスト。これらの値は、クラスターの概念を作成するための外部列であり、クラスターソリューションのトレーニングには使用されません。外部列が指定されている場合、それらの列はクラスターの概念にのみ使用され、クラスタリングソリューションのトレーニングには使用されません。クラスターの概念フィールドはクラスタリング定義 [ml_capability_definition_clustering] テーブルにリストされます。デフォルト：入力テキスト列でクラスターの概念を生成
config.dataset	オブジェクト	DatasetDefinition オブジェクト名。
config.domainName	文字列	オプション。このデータセットに関連付けられたドメイン名。「ドメインセパレーション」および予測インテリジェンス「」を参照してください。デフォルト：現在のドメイン。例：`"global"`。
config.encoder	オブジェクト	algorithmConfig プロパティを `"levenshteinDistance"` に設定する場合を除き、必須です。このソリューションに割り当てるトレーニング済みのエンコーダーオブジェクト。「Encoder - Encoder(オブジェクト config)」を参照してください。
config.groupByFieldName	文字列	オプション。レコードを 1 つ以上のクラスターにグループ化するためのフィールド名。次のセットアップ例では、各タイプが個別のクラスターにグループ化され、10 個のクラスターがレンダリングされます。 groupByFieldName の値は `'category'` です DatasetDefinition tableName 値は`「incident」`ですインシデント [incident] テーブルには 10 個のカテゴリタイプがあります
config.groupUnclusteredRecords	ブーリアン	結果内でクラスター化されていないレコードをグループ化するかどうかを示すフラグ。有効な値： true：結果内でクラスター化されていないレコードを個別にグループ化します。 false：結果内でクラスター化されていないレコードをグループ化しません。クラスター化されていない値 (-1) は残りの結果とともに表示されます。デフォルト値：false
config.inputFieldNames	アレイ	文字列としての入力フィールド名のリスト。モデルは、次のフィールドを使用して予測を行います。
config.label	文字列	予測タスクを識別します。
config.maxTimeWindowForUpdate	番号	オプション。モデル更新ポイントから遡ってレコードを検索する時間 (分)。例えば、値が 15 の場合、システムは過去 15 分間に作成されたレコードのみを検索します。デフォルトでは、すべてのレコードがスキャンされます。
config.minRecordsPerCluster	番号	オプション。クラスター内で使用可能なレコードの最小数。値は 2 以上でなければなりません。デフォルト値：2
config.minRowCount	文字列	オプション。トレーニング用のデータセットに必要なレコードの最小数。デフォルト：10000
config.processingLanguage	文字列	2 文字の ISO 639-1 言語コード形式の処理言語。
config.stopwords	アレイ	オプション。language プロパティ設定に基づいて自動的に生成される文字列のプリセットリスト。詳しくは、カスタムストップワードリストの作成を参照してください。デフォルト：英語のストップワード
config.trainingFrequency	文字列	モデルを再トレーニングする頻度。可能な値： every_30_days every_60_days every_90_days every_120_days every_180_days run_once デフォルト：run_once
config.updateFrequency		ソリューション定義のモデルを再構築する必要がある頻度。可能な値： do_not_update every_1_day every_1_hour every_6_hours every_12_hours every_1_minute every_15_minutes every_30_minutes デフォルト：do_not_update

次の例は、オブジェクトを作成して ClusteringSolution ストア。この例は、トレーニングのためにオブジェクトを送信する方法も示しています。

try{
    var myData = new sn_ml.DatasetDefinition({
        'tableName' : 'incident',
        'fieldNames' : ['category', 'short_description', 'state', 'description'],
        'encodedQuery' : 'activeANYTHING'
    });

    // get a trained encoder from the store
    var myEncoder = sn_ml.EncoderStore.get('<encoder_name >');
        
    var mySolution = new sn_ml.ClusteringSolution({
        'label': "clustering solution",
        'dataset' : myData,
        'encoder' : myEncoder,
        'inputFieldNames':['short_description'],                
        'groupByFieldName' : 'category',        
        'algorithmConfig' : {
            'algorithm' : 'kmeans',
            'targetCoverage' : '90'
        }
    });
    
    // add solution
    var solutionName = sn_ml.ClusteringSolutionStore.add(mySolution);
    var solutionVersion = mySolution.submitTrainingJob();    
    var trainingStatus = solutionVersion.getStatus();
    gs.print(JSON.stringify(JSON.parse(trainingStatus), null, 2));

} catch(ex){
    gs.print('Exception caught: '+ ex.getMessage());
}

出力:

{
  "state": "waiting_for_training",
  "percentComplete": "0",
  "hasJobEnded": "false"
}

次の例は、'description' フィールドをクラスターの概念フィールドとして含める方法を示しています。

var myIncidentData = new sn_ml.DatasetDefinition({
    'tableName' : 'incident',
    'fieldNames' : ['category', 'short_description', 'description'],
});

var encodersolutionName = sn_ml.EncoderStore.get('<encoder_name >');

var mySolution = new sn_ml.ClusteringSolution({
	'label': 'clustering_test',
	'dataset': myIncidentData,
	'inputFieldNames': ['short_description'],
	'encoder': encodersolutionName,
	'clusterConceptFieldNames': ['description']
});

var solutionNameFromStore = sn_ml.ClusteringSolutionStore.add(mySolution);
var myClassifier = mySolution.submitTrainingJob();

ClusteringSolution - cancelTrainingJob()

トレーニングのために送信されたソリューションオブジェクトのジョブをキャンセルします。

表 : 2. パラメーター
名前	タイプ	説明
なし

表 : 3. 返される内容
タイプ	説明
なし

次の例は、既存のトレーニングジョブをキャンセルする方法を示しています。

var mySolution = sn_ml.ClusteringSolutionStore.get('ml_sn_global_global_clustering');

mySolution.cancelTrainingJob();

ClusteringSolution - getActiveVersion()

アクティブな ClusteringSolutionVersion オブジェクト。

表 : 4. パラメーター
名前	タイプ	説明
なし

表 : 5. 返される内容
タイプ	説明
オブジェクト	アクティブな ClusteringSolutionVersion オブジェクト。

次の例は、 ClusteringSolution ストアからアクティブなバージョンを取得し、そのトレーニング状態を返す

var mlSolution = sn_ml.ClusteringSolutionStore.get('ml_x_snc_global_global_clustering');

gs.print(JSON.stringify(JSON.parse(mlSolution.getActiveVersion().getStatus()), null, 2));

出力：

{
  "state": "solution_complete",
  "percentComplete": "100",
  "hasJobEnded": "true"
}

ClusteringSolution - getAllVersions()

クラスタリングソリューションのすべてのバージョンを取得します

表 : 6. パラメーター
名前	タイプ	説明
なし

表 : 7. 返される内容
タイプ	説明
アレイ	ソリューションオブジェクトの既存のバージョン。関連項目 ClusteringSolutionVersion API

次の例は、すべての ClusteringSolution バージョンオブジェクトを取得し、それらに対して getVersionNumber() および getStatus() ソリューションバージョンメソッドを呼び出す方法を示しています。

var mlSolution = sn_ml.ClusteringSolutionStore.get('ml_x_snc_global_global_clustering');

var mlSolutionVersions = mlSolution.getAllVersions();

for (i = 0; i < mlSolutionVersions.length; i++) {
gs.print("Version " + mlSolutionVersions[i].getVersionNumber() + " Status: " + mlSolutionVersions[i].getStatus() +"\n");
};

出力：

Version 3 Status: {"state":"solution_complete","percentComplete":"100","hasJobEnded":"true"}

Version 2 Status: {"state":"solution_complete","percentComplete":"100","hasJobEnded":"true"}

Version 1 Status: {"state":"solution_cancelled","percentComplete":"0","hasJobEnded":"true"}

ClusteringSolution - getLatestVersion()

の最新バージョンを取得しますソリューション。

表 : 8. パラメーター
名前	タイプ	説明
なし

表 : 9. 返される内容
タイプ	説明
オブジェクト	ClusteringSolutionVersion の最新バージョンに対応するオブジェクトClusteringSolution() です。

次の例は、ソリューションの最新バージョンを取得して、トレーニングのステータスを返す方法を示しています。

var mlSolution = sn_ml.ClusteringSolutionStore.get('ml_x_snc_global_global_clustering');

gs.print(JSON.stringify(JSON.parse(mlSolution.getLatestVersion().getStatus()), null, 2));

出力：

{
  "state": "solution_complete",
  "percentComplete": "100",
  "hasJobEnded": "true"
}

ClusteringSolution - getName()

ストアの操作に使用するオブジェクトの名前を取得します。

表 : 10. パラメーター
名前	タイプ	説明
なし

表 : 11. 返される内容
タイプ	説明
文字列	ソリューションオブジェクトの名前。

次の例は、 ClusteringSolution データセット情報を更新し、オブジェクトの名前を出力

// Update solution
var myIncidentData = new sn_ml.DatasetDefinition({
   'tableName' : 'incident',
   'fieldNames' : ['category', 'short_description', 'priority'],
   'encodedQuery' : 'activeANYTHING'
});

var eligibleFields = JSON.parse(myIncidentData.getEligibleFields('clustering'));

var myCluster = new sn_ml.ClusteringSolution({
   'label': "my clustering solution",
   'dataset' : myIncidentData,
   'inputFieldNames': eligibleFields['eligibleInputFieldNames'],
   'predictedFieldName': 'category'
});

// update solution
sn_ml.ClusteringSolutionStore.update('ml_x_snc_global_global_clustering_solution', myCluster);

// print solution name
gs.print('Solution Name: '+myCluster.getName());

出力：

Solution Name: ml_x_snc_global_global_clustering_solution

ClusteringSolution - getProperties()

ソリューションオブジェクトプロパティを取得します。

表 : 12. パラメーター
名前	タイプ	説明
なし

表 : 13. 返される内容
タイプ	説明
オブジェクト	ClusteringSolutionStore のデータセットのコンテンツと ClusteringSolution() オブジェクトの詳細。 `{ "algorithmConfig": {Object}, "datasetProperties": {Object}, "domainName": "String", "encoder": {Object}, "groupByFieldName": "String", "inputFieldNames": [Array], "label": "String", "minRecordsPerCluster" : Number, "name": "String", "processingLanguage": "String", "scope": "String", "stopwords": [Array], "trainingFrequency": "String", "updateFrequency": "String" }`
<Object>.algorithmConfig	アルゴリズム設定プロパティを含む JavaScript オブジェクト。プロパティの結果は、algorithm プロパティに設定された値によって異なります。 `'algorithmConfig' : { "algorithm": "String", // See algorithmConfig.algorithm setting description for property settings based on algorithm }` データタイプ：オブジェクト。
<Object>.algorithmConfig.algorithm	ソリューションをエンコードする方法。 dbscan のプロパティ： `'algorithmConfig': { "algorithm": "dbscan", "distanceMetric": "String", "epsilon": Number, "minimumNeighbours": Number }` kmeans のプロパティ： `'algorithmConfig': { "algorithm": "kmeans", "targetCoverage": Number }` データタイプ：文字列。
<Object>.algorithmConfig.distanceMetric	DBSCAN アルゴリズムのみ。類似のデータオブジェクトをスキャンする距離測定基準。データタイプ：文字列。
<Object>.algorithmConfig.epsilon	DBSCAN アルゴリズムのみ。近接検索半径のサイズを表す 0 と 1 の間の小数値。データタイプ：数値。
<Object>.algorithmConfig.minimumNeighbours	DBSCAN アルゴリズムのみ。ポイントがクラスターの一部となるために必要な近接の最小数。levenshteinDistance の場合、データセットから除外されるポイントがないように、値を 1 にする必要があります。データタイプ：数値。
<Object>.algorithmConfig.targetCoverage	K 平均法アルゴリズムのみ。互いにあまり類似していないレコードを除外するためのパーセンタイルフィールド。データタイプ：数値。
<Object>.datasetProperties	ソリューションに関連付けられた DatasetDefinition() オブジェクトのプロパティを一覧表示します。 `{ "encodedQuery": "String", "fieldDetails": [Array], "fieldNames": [Array], "tableName": "String" }` データタイプ：オブジェクト
<Object>.datasetProperties.tableName	データセットのテーブルの名前。例：`"tableName" : "Incident"` データタイプ：文字列。
<Object>.datasetProperties.fieldNames	指定されたテーブルからの文字列としてのフィールド名のリスト。例：`"fieldNames" : ["short_description", "priority"]` データタイプ：アレイ。
<Object>.datasetProperties.fieldNames.fieldDetails	フィールドプロパティを指定する JavaScript オブジェクトのリスト。 `[ { "name": "String", "type": "String" } ]` データタイプ：アレイ。
<Object>.datasetProperties.fieldNames.fieldDetails.<object>.name	このデータセットを制限する情報のタイプを定義するフィールドの名前。データタイプ：文字列。
<Object>.datasetProperties.fieldDetails.<object>.type	機械学習フィールドタイプ。データタイプ：文字列。
<Object>.datasetProperties.fieldDetails.encodedQuery	標準の Glide 形式のエンコードされたクエリ文字列。「エンコードされたクエリ文字列」を参照してください。データタイプ：文字列。
<Object>.domainName	このデータセットに関連付けられたドメイン名。「ドメインセパレーション」および予測インテリジェンス「」を参照してください。データタイプ：文字列。
<Object>.encoderProperties	このソリューションに割り当てられるエンコーダーオブジェクト。「Encoder - Encoder(オブジェクト config)」を参照してください。データタイプ：オブジェクト
<Object>.groupByFieldName	レコードを 1 つ以上のクラスターにグループ化するためのフィールド名。データタイプ：文字列
<Object>.inputFieldNames	文字列としての入力フィールド名のリスト。モデルは、次のフィールドを使用して予測を行います。データタイプ：文字列。
<Object>.label	予測タスクを識別します。 `{ "label": "my first prediction" }` データタイプ：文字列。
<Object>.minRecordsPerCluster	クラスター内で使用可能なレコードの最小数。データタイプ：数値。
<Object>.name	システムによって割り当てられた名前。データタイプ：文字列。
<Object>.predictedFieldName	予測可能性についてトレーニングするフィールドを識別します。データタイプ：文字列。
<Object>.processingLanguage	2 文字の ISO 639-1 言語コード形式の処理言語。データタイプ：文字列。
<Object>.scope	オブジェクトスコープ。現在、有効な値は `global` のみです。データタイプ：文字列
<Object>.stopwords	オプション。language プロパティ設定に基づいて自動的に生成される文字列のプリセットリスト。詳しくは、カスタムストップワードリストの作成を参照してください。データタイプ：アレイ。
<Object>.trainingFrequency	モデルを再トレーニングする頻度。可能な値： every_30_days every_60_days every_90_days every_120_days every_180_days run_once デフォルト：run_once データタイプ：文字列。
<Object>.updateFrequency	ソリューション定義のモデルを再構築する必要がある頻度。可能な値： do_not_update every_1_day every_1_hour every_6_hours every_12_hours every_1_minute every_15_minutes every_30_minutes デフォルト：do_not_update データタイプ：文字列

次の例では、ストア内のソリューションオブジェクトのプロパティを取得します。

var myCluster = new sn_ml.ClusteringSolutionStore.get("ml_x_snc_global_global_clustering_solution");

gs.print(JSON.stringify(JSON.parse(myCluster.getProperties()), null, 2));

データタイプ：数値。

*** Script: {
  "algorithmConfig": {
    "algorithm": "kmeans",
    "targetCoverage": "90"
  },
  "datasetProperties": {
    "tableName": "incident",
    "fieldNames": [
      "category",
      "short_description",
      "state",
      "description"
    ],
    "encodedQuery": "activeANYTHING"
  },
  "domainName": "global",
  "encoderProperties": {
    "datasetsProperties": [
      {
        "tableName": "incident",
        "fieldNames": [
          "assignment_group",
          "short_description",
          "description"
        ],
        "encodedQuery": "activeANYTHING"
      }
    ],
    "domainName": "global",
    "label": "my encoder definition",
    "name": "ml_x_snc_global_global_my_encoder_definition",
    "processingLanguage": "en",
    "scope": "global",
    "stopwords": [
      "Default English Stopwords"
    ],
    "trainingFrequency": "run_once"
  },
  "groupByFieldName": "category",
  "inputFieldNames": [
    "short_description"
  ],
  "label": "clustering solution",
  "minRecordsPerCluster": 2,
  "name": "ml_x_snc_global_global_clustering_solution",
  "processingLanguage": "en",
  "scope": "global",
  "stopwords": [
    "Default English Stopwords"
  ],
  "trainingFrequency": "run_once",
  "updateFrequency": "do_not_update"
}}

ClusteringSolution - getVersion(文字列 version)

指定されたバージョン番号を使用します。ソリューションを取得します

表 : 14. パラメーター
名前	タイプ	説明
version	文字列	ソリューションの既存のバージョン番号。

表 : 15. 返される内容
タイプ	説明
オブジェクト	ClusteringSolutionVersion API メソッドを呼び出すことができる ClusteringSolution() オブジェクトの指定されたバージョン。

次の例は、バージョン番号別にソリューションのトレーニングステータスを取得する方法を示しています。

var mlSolution = sn_ml.ClusteringSolutionStore.get('ml_x_snc_global_global_clustering');

gs.print(JSON.stringify(JSON.parse(mlSolution.getVersion('1').getStatus()), null, 2));

データタイプ：数値。

{
  "state": "solution_complete",
  "percentComplete": "100",
  "hasJobEnded": "true"
}

ClusteringSolution - setActiveVersion(文字列 version)

ストア内の指定されたバージョンのソリューションをアクティブ化します。

表 : 16. パラメーター
名前	タイプ	説明
version	文字列	アクティブ化する ClusteringSolution() オブジェクトバージョンの名前。このバージョンをアクティブ化すると、他のバージョンが非アクティブになります。

表 : 17. 返される内容
タイプ	説明
なし

次の例は、ストア内のソリューションバージョンを有効にする方法を示しています。

sn_ml.ClusteringSolution.setActiveVersion("ml_incident_categorization");

ClusteringSolution - submitTrainingJob()

トレーニングジョブを送信します。

注:

このメソッドを実行する前に、まず ClusteringSolutionStore - add() メソッドを使用してソリューションをストアに追加する必要があります。

表 : 18. パラメーター
名前	タイプ	説明
なし

表 : 19. 返される内容
タイプ	説明
オブジェクト	ClusteringSolutionVersion に対応するオブジェクト ClusteringSolution トレーニング対象

次の例は、データセットを作成してソリューションに適用し、ソリューションをストアに追加して、トレーニングジョブを送信する方法を示しています。

// Create a dataset 
var myData = new sn_ml.DatasetDefinition({

  'tableName' : 'incident',
  'fieldNames' : ['assignment_group', 'short_description', 'description'],
  'encodedQuery' : 'activeANYTHING'

});

// get a trained encoder from the store
var myEncoder = sn_ml.EncoderStore.get('ml_x_snc_global_global_encoder');

// Create a solution 
var mySolution = new sn_ml.ClusteringSolution({

  'label': "my solution definition",
  'dataset' : myData,
  'encoder' : myEncoder,
  'predictedFieldName' : 'assignment_group',
  'inputFieldNames':['short_description']

});

// Add the solution to the store to later be able to retrieve it.
var my_unique_name = sn_ml.ClusteringSolutionStore.add(mySolution);

// Train the solution - this is a long running job 
var myClusterVersion = mySolution.submitTrainingJob();