How to bring External Content into AI Search

Shamus Mulhall · ‎03-06-2023

Introduction

The goal of this article is to provide detailed steps to configure and index external content for AI Search. This example will be specific to the process of indexing content located outside of your Now Platform instance. Indexing this content will make it available to search in AI Search.

Overview

This article will step through the configuration required to index external content using Flow Designer, for this example the external source will be a simple JSON object. There are both pull and push options when considering how to index content into AI Search to make it available for search. The latter would use the existing REST API’s to push content to AI Search, this is the most complex option, but it offers the most flexibility. In this document we will cover the pull option where a request is made to an external source for content and that content is processed within a flow designed using IntegrationHub and AI Search Spoke.

There are several other connector/pull options available, however there are licensing requirements associated. For more information about licensing and connectors, we recommend connecting with your account representative.

For this example, we will use a simple REST endpoint to retrieve JSON content. To begin we will create the external content schema, then link that to a search profile. Following that is a detailed walk through of an

example on how to index external content.

Create External Content Schema

To begin, define a schema table with columns corresponding to fields on records from an external data source. AI Search uses the schema when indexing content from the external data source. The external content schema table does not store data in the database. Instead, its columns serve as a map of AI Search index fields to populate when you index content from external data sources.

Navigate to All > AI Search > External Content > Create Schema create new table. Give the schema a name and Submit. The OOTB table should be sufficient for most scenarios, if there are additional fields aside from the title you can add those at this stage. For this exercise we will add the ‘short_description’ field as an additional string field. This will help fill out the search results view.

Link External Content to Search Application

After creating the external content table, a link to the search profile is required. First create the Indexed Source, then create a mapping of “short_description” to “text”. This step is done to facilitate rendering results.

Now create a Search Source to allow us to add this content to the appropriate search profile.

Add the new External Content search source to the OOTB Service Portal search profile.

Create Data Stream Action

While not an AI Search feature the Data Stream action is part of Integration Hub and is a key component to indexing external content. Using the filter navigation open Flow Designer All > Process Automation > Flow Designer to create a New > Data Stream. This will use the For each flow logic option to process stream data in a flow, allowing us to create an ingest document in AI Search for each object in the data stream. We highlight below configuration of an example Data Stream with REST request and JSON/XML Splitter and a Script Parser.

Splitter step with Source Format -> JSON and a JSON item path configured.

Parsing the JSON content from the source.

Finally configuring the output variables to be used in the external content flow.

Create Indexing Flow

Again, within Flow Designer select New > Flow and set the Flow properties appropriately.

After setting the properties, the first component within the flow itself is the Initialize Batcher. This will initialize a new batcher to queue documents for indexing. The size setting on the batcher will depend on the size of the content being indexed, start with 20.

After initializing the batcher, we will connect and retrieve the external content. This step will vary in the way the content is retrieved; in this case, a data stream action will be used to get JSON content from a sample REST api. Details on creating a data stream action are available here: Use a Data Stream action in a flow.

This data stream will retrieve JSON content from a REST endpoint, the content is parsed by the data stream action and for each object a new ingest document will be created.

Configure the ingest document action setting the External Content Table to the new schema table created above. Set the batcher ID to the one initialized in the first flow action. The Document ID configuration is important to note that it must be unique and consistent across executions of the indexing process. The reason for this is to eliminate duplication of content while allowing for updates.

After creating the ingest document we want to add it to the batcher to later be indexed, however prior to adding it, first check to see if the batcher is full and needs to be flushed. Adding a condition to check if there is an error while ingesting properties into the batcher. If there is an error the next step will be to commit the content to the index to flush the batcher.

To commit the index, add the AI Search Commit Index action setting the batcher id to the id of the previously initialized batcher.

Within the same conditional where the batcher was flushed with the Commit Index action, add an Ingest Document action to process the document that initiated the error so that content is not lost. The Ingest Document action will have the same configuration as the previous Ingest Document action.

Checking for full batcher configuration and handling error case.

Finally, after all the content has been processed the last couple actions can be put in place. The last documents in the batcher should be committed Commit Index and then the batcher itself should be released Release Batcher.

Once the flow is completed, configure the trigger for this new flow to suit the new external content's indexing requirements.

Create View for Search Results

Using the OOTB EVAM templates for external content will work, however often a custom view is desired. This article will guide you through the process of customizing the view of external content

Verifying External Content

To verify the new external content has been indexed and is available for search you can run the Test option from within Flow Designer.

After the test has been completed you can view the flow execution details via the link in the modal dialog. Following that navigate to the Service Portal to view content.

Content Updates

After the initial execution and indexing of new external content, subsequent executions may contain new content along with updates to existing content. From a configuration perspective there is nothing needed to support this assuming that the configuration of the Document ID property in the Ingest Document flow stage is set to a unique value that will be consistent across executions.

Removing Content from AI Search

When using this ‘pull’ option to add external content to AI Search, the delete process is more complex, there are two available REST API’s that will allow you to delete external content form AI Search. The first is deleteByQuery which deletes all external documents that match the specified query from the AI Search index. The second deleteDocument deletes the external document with a specified unique identifier from the AI Search index. In addition to using the REST API’s a new flow can also handle deleting content. This example flow configuration shows the same Data Stream action to get the content, the assumption here is that the endpoint will return content that should be removed from AI Search. Once the content is received the same ‘For each’ logic is followed and the AI Search Delete Document action is used, again the configuration assumes that the Document ID configuration follows the same process that was used when the content was indexed.

Delete Document action configuration.

REST API Explorer example showing the delete by query API.

AI Search Index Analytics

Index analytics provides insight into the current index content and is available as part of the Advanced AI Search Management Tools

AI Search > AI Search Analytics > Search Index Analytics. The Total Indexed Documents below showing the external documents that are available in AI Search.

DanilMa · ‎08-25-2023

Hi, @Shamus Mulhall! Sorry for offtop, maybe you may have come across the creation of filters for external data in AI search, unfortunately creating classic facet isn't available, because we just map data by flow by one rest API to External Content Schema (like you mentioned above). But can get filters in way requesting to another API, mean we didn't store taxonomy data in table of ServiceNow

Rivka Br · ‎05-15-2024

Hi,

I do it like your steps.
I have all the time issue with INTERNAL SERVER ERROR
Do you know what we can fix?

David B4 · ‎05-15-2024

In knowledge There was a presentation on platform intelligence where they had preconfigured external sources to add into AI search such as sharepoint, teams, confluence. Is this a feature that has been delivered with washington or it is yet to be released?

Diogo Almeida1 · ‎05-20-2024

Hi @Shamus Mulhall thank you for the guide to index an external source to AI Search. I do have a question, which trigger did you use in the example you provided?

Thanks for your help!

Ujjwal2 · ‎06-06-2024

@Shamus Mulhall

We have 25000+ documents to be processed, so I believe we need to set 25000 value in the Initialize Batcher. But it is failing, it worked when I set it to 20.

Can anyone suggest, do I need to set to 25000 (.i.e. same count as number of docs to be processed) or not? If I need to process 25000

Gerard Dwan · ‎06-06-2024

Hey @Ujjwal2,

That's the number of documents you are sending to the system to process at the same time. 10-20 is a better area for the batcher. You would need to check and commit the index once the batcher is full (as mentioned in the 'If Batcher needs to be flushed' flow step above).

Salban · ‎07-10-2024

Hi @Gerard Dwan,
As the content schema is not populated with data and is used as a map. Does that mean that the flow should be run everytime someone searches for something? And in that case, what should the trigger of the flow look like?

Br Salban

adambuj · ‎07-10-2024

@Salban The schema is only used as a mapping table. The data itself is then mapped into an AI Search index whenever the flow is executed, and stored there for retrieval when the customer queries. The index itself needs to be populated once, and then it can be updated at whatever frequency is desired.

Carlo Jimenez · ‎10-16-2024

Hi @Shamus Mulhall and @Gerard Dwan, were there updates or changes to navigation in the Washington DC release? I noticed that indexed external sources were appended by "searchTerm=" followed by the name of the indexed item. Can you help me point to the right direction where should I be looking at? Thank you!

AnamN · ‎10-24-2024

Hi @Shamus Mulhall

Thanks for the details

One question, As the External AI Search tables are indexed, Is there a possibility to query and get the data from these tables. Any API which can do that, couldn't find in the ServiceNow docs.

Thanks,

Anam

adambuj · ‎10-24-2024

@AnamN The data is not actually stored in the ServiceNow instance. It uses the external content schema table as a mapping to the AI Search index. As such, there's no direct way outside of using the AI Search experience to extract the content.

AnamN · ‎10-24-2024

@adambuj Thanks for your response

So, you mean it's possible to do in AI Search experience. How can I do that?

Actually my requirement is to build the logic on the data retrieved from the external content schema. I don't want to store it in some table because the data is very huge. Can I directly access from the external content schema table using AI search experience?

Vineet Yadav1 · ‎11-07-2024

@Anam I also have same requirement did you any solution.

svani · ‎01-06-2025

HI All,

i have followed the same steps but i am getting an error message as :"Failed to iterate on data stream: com.glide.transform.transformer.exceptions.InvalidStructureException: JsonStreamParser[0]: JSON must be an object or an array: '<' "can anyone please help me here.

thanks in advance

SivaK7752622441 · ‎02-05-2025

Is there any way to setup user criteria or permissions around the content indexed this way?

Kass3m · ‎09-22-2025

@Shamus Mulhall are any of these steps needed if you are using the Sharepoint Connector with XCC?