Enabling Multilingual Search with AI Search

Heather Phipps · ‎10-08-2021

This article has been deprecated. Please see Enabling Multilingual Search with AI Search: Utah Edition for updated information.

Introduction

AI Search supports indexing and search for all languages offered by the Now Platform. Advanced linguistic search features are available in English, French Canada, French, German, Japanese, and Spanish.

ServiceNow users expect search to work seamlessly across multiple languages. They want to be able to issue queries across all available content, regardless of language, and retrieve highly relevant results. This article describes the benefits of AI Search relative to Zing for multilingual content, how document filtering by language works in AI Search, and how to configure AI Search to improve the search experience for international users.

Multilingual Search Challenges

Search is fundamentally about matching terms in your query to documents in your index containing these terms. Results relevancy comprises both precision—what percentage of retrieved results are relevant—and recall—what percentage of relevant results are retrieved. A search engine's precision and recall depend on the quality of the natural language processing (NLP) applied to indexed documents and query text. Out of the box, before any tuning, AI Search provides a ~10% relevancy lift over Zing¹ largely due to its more sophisticated NLP. NLP for search includes three essential tasks: tokenization, language-specific normalization, and decompounding.

Tokenization

Before you can match terms in your query to terms in your index, you need to tokenize the text, i.e., break it apart into discrete words. Tokenization is relatively straightforward in languages in which words are mostly space delimited, such as English. In Japanese, on the other hand, words aren’t space delimited. Both AI Search and Zing perform Japanese tokenization based on morphological analysis, which breaks up text into real words, ensuring accurate search results. Here is an example:

東京都の人口 (Tokyo population)

Using morphological analysis, this text gets tokenized as follows:

東京 (Tokyo)
都 (city)
の ([possessive particle])
人口 (population)

By contrast, simple substring matching would result in an incorrect match for the query 京都 (Kyoto), decreasing search precision.

Language-Specific Normalization

Next, text needs to be normalized at both the character and word level to ensure that a query for one form of a word matches all other forms of the word. As illustrated by the following examples, at the character level, accents should be removed since they are frequently omitted from search queries, and Asian half-width characters should be normalized to match their full-width counterparts:

Pâtisserie → Patisserie
ﾎﾃﾙ → ホテル

Word-level normalization is especially important in European languages because they are highly inflected, meaning words are modified based on tense, quantity, gender, aspect, and other factors. For example, the query “selling” should match a document containing the term “sold”.

AI Search supports lemmatization—the identification of the dictionary form of the word, based on context—in English, French Canada, French, German, Japanese, and Spanish. By contrast, Zing only supports lemmatization in Japanese. Zing uses stemming—an alternative normalization method that truncates words via simplistic rules, without considering context—in English, French, and German. Stemming can reduce recall because related words may have different stems. Continuing our previous example:

Input	Zing	AI Search
selling	sell	sell
sold	sold	sell

Stemming can also lower precision because unrelated words may share the same stem, as in this French example:

Input	Zing	AI Search
faut (necessary)	faut	faut
faute (mistake)	faut	faute

Decompounding

In certain languages, such as German, compound words are prevalent. These words need to be broken down into their constituent parts to maximize recall. For example, the compound word “humanressourcen” should be broken down into its component terms, “human” and “ressourcen”. This will ensure that queries for the component terms match documents containing the compound word and vice versa. Only AI Search supports German decompounding.

Document Filtering by Language in AI Search

There are two categories of translated content in the Now Platform:

Translated fields, such as Catalog Item fields.
Translated documents, such as Knowledge articles.

The default AI Search filtering behavior differs for these two types of content.

In the case of Catalog Item search, if a field lacks a translation in the user’s session language, AI Search effectively falls back to exact matching against the English-language field value, as shown:

By contrast, in the case of Knowledge search, AI Search only searches articles that are in the same language as the user’s session, as shown:

Enabling Global Fallback in Quebec & Rome

Users expect to search across Knowledge content not just in their session language, but also in English, similar to how Catalog Item search works today. One reason for this is that customers don’t necessarily want to localize all their Knowledge content—they have global Knowledge content, typically written in English, that needs to be searchable by all users. To better align the Knowledge search experience with that of Catalog Item search, you can treat English Knowledge content as global, making it “exact match” searchable by all users, regardless of their session language, by following this procedure:

Define a new column on the Knowledge [kb_knowledge] table. In the following example, the column is named u_search_index_language.
Populate the new u_search_index_language column for Knowledge articles:
To populate the new u_search_index_language column for inserted, updated, or displayed articles in kb_knowledge, create a Business Rule. This Business Rule should leave the new column empty for articles in English; for other articles, it should copy the value (2-letter language code) from the existing language field.

See sample Business Rule (attached):
sys_script_0f4c519054813010f877f5103057e671.xml

To populate the new u_search_index_language column for existing articles, navigate to System Definition > Scripts - Background and run the following background script:
```
var userGR = new GlideRecord("kb_knowledge");
userGR.addQuery("language", "!=", "en");
userGR.query();
while (userGR.next()) {
  if (userGR.u_search_index_language !== userGR.language) {
    userGR.u_search_index_language = userGR.language;
    userGR.setWorkflow(false);
    userGR.update();
  }
}
```
Note: Be sure to adjust the table and field names in both the sample Business Rule and background script to match your content.
Navigate to AI Search > AI Search Index > Indexed Sources.
Edit the Knowledge Table record.
In the Field Settings & Mapping related list, locate the map_to_raw setting that has translation_language_id as its value and change its field from language to the new u_search_index_language field.
Before:

After:
Reindex all tables for the Knowledge Table indexed source.
After any bulk importing of content, be sure to re-run the background script outlined in Step 2, above, to populate the u_search_index_language field for the new records.
Follow the same procedure for any indexed custom tables that do not extend kb_knowledge and that store translated content as separate records rather than as translated fields.

How It Works

The u_search_index_language field gets used in two primary ways:

For filtering: search queries are only matched against content when the content language, as defined by this field, matches the session language or is null.
For determining which language processor is applied to the record at index time. If this field is null, the document is processed as if it were English.

As a result, setting the value of this field to null for content in English makes this content “exact match” searchable to all users, regardless of their session language.

Interaction with Other Features

The following table describes interactions between the fallback mechanism and various search features in Quebec and Rome:

Feature	Interaction with Fallback
Auto-complete	Auto-complete suggestions are limited to the session language. This behavior is the same as without the fallback.
Stop words	AI Search uses the session language’s stop word dictionary if available; otherwise, it falls back to using the English dictionary. This behavior is the same as without the fallback.
Synonyms	AI Search uses the session language’s synonym dictionary if available; otherwise, it falls back to using the English dictionary. This behavior is the same as without the fallback.
Typo handling	AI Search uses the union of the session language and English dictionaries. This behavior differs from the default.
Result Improvement Rules	Only result improvement rules for the session language will be applied. This behavior is the same as without the fallback.

Testing

To validate that the fallback is working as designed, use the following test cases, substituting the languages and terms that are relevant to your use case:

When the session language is French, French linguistics should still apply.
Example: A query for dormir should retrieve a French Knowledge article containing dormais.
Similarly, English search should remain linguistically aware (i.e., not be reduced to exact match) when the session language is English.
Example: A query for ran should retrieve an English Knowledge article containing run.
When the session language is French, and when a term is present in both English and French Knowledge articles, querying for this term should return both articles.
Example: A query for Java should retrieve both French and English Knowledge articles containing Java.

Global Fallback in San Diego

In the San Diego release, we are productizing this fallback mechanism by introducing the concept of a Global Locale, an additional locale used for searching Knowledge articles. The Global Locale will be defined by the instance locale by default and configurable by the admin. Global fallback will be enabled or disabled using the glide.ais.translate.enable_global_language_fallback system property; it will be disabled by default, producing the same search behavior as in the Quebec and Rome releases. If you’ve already implemented global fallback in Quebec or Rome using the procedure described here and are planning to upgrade to San Diego, please stay tuned for a follow-up article with step-by-step instructions for ensuring that the fallback continues to work post-upgrade.

¹(AI Search NDCG) - (Zing NDCG) as measured on hand-labeled golden sets.

Jack Littlewort · ‎11-18-2021

Is this similar to the workaround for language search described here?

AI Search knowledge language internationalization - Share | ServiceNow Developers

Heather5 · ‎10-27-2022

Can a user have visibility on all articles, regardless of language, and filter on the languages they speak? We have users who speak 4 - 5 languages and have their default language set as English (therefore the global fallback doesn't help). They want to be able to see articles in ALL languages they speak, not just one or two.

DorianK · ‎10-27-2022

+1 @Heather5 comment. We ran into this issue as well when trying to support multi languages. We ended up turning off the default AI search behavior and use search result improvement to show the best one but I think official support would be better.

Heather5 · ‎10-27-2022

@DorianK thanks for the fast reply - so do you mean that you kept AI Search (i.e. didn't revert back to Zing), and just adjusted the configuration to achieve what you describe above? Going back to Zing is not an option for us, but I am hopeful that your workaround could work for us too. FYI, I also created an idea (AI Search - Enable Multi-language view option), it would be great if you can vote it up so we can get this solved officially.

DorianK · ‎10-27-2022

@Heather5- yeah we kept AI search and adjusted configuration to show languages of the user's preference first (and then further down were other languages). I think you can also build a picker on the actual article to switch to a translated version of the article. That's what I remember off the top of my head.

Elizabeth32 · ‎12-20-2022

We have users that have their session language set to English, but they want to see articles in other languages. Not many users translate the platform. Agents should be able to see articles in all languages that they have access to see, not just articles in their session language and English (as is the case with global fallback). Is this on the roadmap for future releases so we don't have to customize to fix this?

Ja1 · ‎01-18-2023

@DorianK Would be interested to know what exactly you did to enable users preference first and then other languages further down the list?

We have a similar requirement here but as of now only artices matching the users session language appear in the search results.

DorianK · ‎01-24-2023

@Ja1- if you look at the instructions she provided with the language / u_search_index_language on the field setting and mapping, changing that will remove the "only see your language" articles (the language field field and mapping is driving that behavior AFAIK). Then you can use boosting rules to improve which ones are seen first. That's the quick summary of how to go about it.