Best Practices and How-to's for Proofreading Multi-Language NLU Model translations

nilimadesai · ‎09-30-2021

****** This article applies to Quebec and later releases that support Multi-Language NLU ******

Starting with Quebec release, several non-english languages are supported within NLU in the NOW platform as part of Multi-language NLU support. The community article 'Guided Overview to Implementing Multilingual NLU Models in NOW platform' provides guidance on implementing multilingual NLU in Quebec. In addition, a playbook (with a link here) will soon be available, that helps customers implement multilingual NLU. The Multilingual NLU playbook will contain all of this information cohesively in a single document.

When working with multilingual NLU models, the platform's translation capabilities can be used to initially machine translate base NLU models. However, after the initial translations, it is recommended to have linguists proof read the machine translations to help further improve the model. These linguists may not necessarily have admin access to NLU, nor may they be familiar with NLU Workbench capabilities in order to be able to directly review and update the translations from within Workbench.

This blog provides steps and guidance for how to proof read utterances outside of the instance and bring the updates back into the needed tables using platform's export, import set, and scripting capabilities. For the purpose of this blog, it is assumed that Internationalization and NLU plugins are already activated and configured and focus is on managing proof reading of utterances.

IMPORTANT DISCLAIMER: While the scripts provided in this blog have been adequately tested, they are not officially supported by ServiceNow.

Step 1: Export the foreign language model utterances in CSV for proofreading

While admins can review model utterances directly through the NLU Workbench, most resources – whether internal, or external – that will review the machine-translated model utterances may not have admin access to the Now Platform. Exporting material in spreadsheet format is necessary, especially since test utterances cannot be edited currently.

It is best to use CSV format for this since that is what is supported when the proofread utterances need to be re-imported back to the instance.

Export train utterances from NLU Model:
- Go to sys_nlu_utterance table and filter the list by dot walking to Intent  NLU Model  NLU Model  Display name column and setting the list filter to the Model we are interested in exporting
- In case only certain intent’s utterances are needed, the list can be further filtered by Intent column as well
- Ensure only the needed columns that need to be exported to CSV are included in the list layout
- Export the list to CSV by right clicking on the column header and selecting Export/CSV

Export test utterances from Batch Testing table:
- Enter nlu_batch_test_utterance.list from your ServiceNow instance’s filter navigator and filter the list by Test Set column.
- In case only certain intent’s utterances are needed, the list can be further filtered by Intent column as well
- Ensure only the needed columns that need to be exported to CSVare included in the list layout
- Export the list to CSV by right clicking on the column header and selecting Export/CSV

Important Note: After the original translations are performed and utterances are exported for proofreading as per above steps, until proofread utterances are imported back into instance, DO NOT re-translate and re-publish translations for the utterances in the model using Dynamic translation. Doing so could result in the original values that were exported for proofreading to be modified. The step to import proofread utterances back to instance relies on these original values for re-upload.

Step 2: Forward multilingual model machine translations to reviewers

Reviewers should be native-speaking, subject-matter experts working sourced in-house, or outsourced to your preferred language service provider. At this point, reviewers can work directly from CSV files with UTF-8-character set – no need to work from the Now Platform.

Step 3: Proofread target-language machine translations

Different approaches may be taken to proofread machine-translated utterances.

The exhaustive approach
- Each model and testing utterances are thoroughly proofread by a native-speaking subject matter expert (e.g., live support agent) or a third-party language-service provider (LSP) to ensure machine-translated utterances look as human as can be.
- While model utterances can be reviewed directly within NOW given your resources do have access to the admin portal, test utterances will have to be exported to Excel/CSV, proofread, and finally reimported back into NOW. For that reason, you are better off exporting both model and testing utterances for review.

Note: You can expect native-speaking, subject-matter experts to review around 250 utterances per hour given they can dedicate their full-time attention to this task.

The quick approach
- From our experience training and testing models with client data, you can afford to focus proofreading efforts only on fixing the most common errors – namely, software and hardware trademarks that got translated literally by the machine-translation service spoke. Acronyms and idiomatic expressions are also to be watched for as they also often tend to get translated literally. You can expect such an approach to penalize your model performance by about 1%, which may very well be worth the tradeoff, especially as native-speaker subject-matter expert availability is often constrained.

Note: You can expect native-speaking subject-matter experts to cover 4-5 times more utterances per hour.

Step 4: Pre-steps to upload proofread target-language model translations

The proofread translations will be uploaded back into the Now Platform (CSV format, UTF-8 charset).
Before importing a csv file, go to “Import Export” under “System Properties”. Set “Import Charset” (“Import Properties” --> “CSV Format” --> “Import Charset”) value to UTF-8 if it is set to any other value.
Alternately, we can access sys_properties table and look for ‘glide.import.csv.charset’ system property and make sure it is set to UTF-8 while performing these steps.
You should replace this value with the original value after you are done with this operation.

Step 5: Upload proofread training utterances using import sets

Import sets allow administrators to import data from various data sources, and then map that data into ServiceNow tables. [ Import sets ] [ Key concepts ] [ Custom CSV ]

Note: The following instructions can be skipped for test utterances if using Batch Testing tool since same steps can be performed from Batch Testing tool user interface on instance.

Create a csv file with two columns: original and modified. The csv file can contain a header row with labels original and modified. Original column should contain machine translated values. Modified column is the proofread translation of the original column. If there are no modifications, you can skip these steps

1) Switch to Global scope

2) Go to “Load Data” under “System Import Sets”

3) Create a table (Ex: nlu_train_utterance_zh_modified) via uploading modified training csv

Go to your table (Ex: u_nlu_train_utterance_zh_modified.list) and check that two columns (original and modified) are uploaded correctly.

4) Run the following script. Make sure that table name, column names in the script below match the table you just uploaded. Also, update the model's name parameter with the display name for the NLU model where the proofread utterances need to be updated.

******* Script start ******

var table_name = "u_nlu_train_utterance_zh_modified"

var original_column_name = "u_original"

var modified_column_name = "u_modified"

var model_name = "<MODEL NAME>"

// tables

var modified_utterances = new GlideRecord(table_name)

// loop through modified utterances

modified_utterances.query()

while (modified_utterances.next()) {

// get original and modified

var original = modified_utterances.getValue(original_column_name)

var modified = modified_utterances.getValue(modified_column_name)

if (modified) {

// find original in nlu utterance table

nlu_utterances = new GlideRecord("sys_nlu_utterance")

query = "utterance=" + original + "^"

query = query + "intent.model.display_name=" + model_name

nlu_utterances.addEncodedQuery(query)

nlu_utterances.query()

if (nlu_utterances.next() && original) {

// replace the original value with modified

nlu_utterances.setValue("utterance", modified)

nlu_utterances.setWorkflow(false)

nlu_utterances.update()

gs.log("Replaced " + original + " with " + modified)

} else {

// cannot find the original in sys_nlu_utterance table

gs.log("Cannot find " + original + " at sys_nlu_utterance table")

}

******* Script end ******

Check the script output to see if all replacements are successful. It should look like this:

Tips:

If a target-language utterance is translated exactly like the source-language utterance, the import will create a duplicate source-language utterance.
If a script does not print any output or if you are receiving errors containing “null”, your table name or column names may be incorrect. Sometimes when you upload a csv, column name can contain double “__”. Make sure that prefix is “u_” instead of “u__” (Ex: u_modified instead of u__modified). Change first three lines of the scripts with the correct names if necessary.

thomaspeket · ‎02-15-2022

To be honest, whenever I have to write something I look for many articles and guides as it is much easier to write when I have some examples and can follow some steps. However, I don't always have time to write them on my own, so in these cases, I use some writing services. I have found the best services on this site website, they do their job very quickly and qualitatively. I hope it is helpful.