Consolidating External Sources into Knowledge v3 search

Sarup Paul · ‎05-11-2016

Knowledge v2 (Eureka and earlier) had a feature called Navigation Add-ons. It was used to show search results from external sources. However, the user experience wasn't ideal, and hence it was deprecated. In our roadmap, we plan to provide a more robust solution.

The HI team, at ServiceNow, (way to go!! zirnhelt) has leveraged our platform's amazing customizability and developed a framework to crawl external sources such as Documentation and Community. We wanted to share some of those ideas, so that you could implement them for your requirements. The framework uses scheduled jobs to extract the content from other sources and falls back on Zing Search to index and provide search results. The diagram below shows the flow of data.

zing doc community kb.jpg

Let's walk through this and understand the various components and where customizations have been done.

Scheduled Job

The framework is mostly housed within a scheduled job that uses APIs of the external content source to extract content. The scheduled job then saves the sources as individual articles in a separate Knowledge Base. Using periodic runs of the scheduled job, the content will be synched over to the KB.

In order to manage multiple content sources effectively, you need to have a configuration object.

Configuration Object

The config object stores the details about the external content source, and how the content will be mapped into the Knowledge Base within your instance. Here are some of the details that you should consider managing:

Target Knowledge Base	The KB where the extracted content would be transformed into individual articles. You should create the Target Knowledge base before you configure this.
Security	Setup access to the articles that are being created through the scheduled job from a particular source
Source URL endpoint (or other config)	A config or URL that can be used by the scheduled job to call the external content source
Redirect URL	When returning search results, you may need a base URL to use to build the final URL for the source content
Security	Define the content security
Category Mapping	Define which KB categories will be assigned to articles that are being created by the content source. You should create the categories in the Target Knowledge base, before you define this mapping.

category mapping.jpg

Synchronization Logic

Once the content is extracted, you should check against existing articles. A checksum is a good way to verify changes.

If the content is new, then Create a new article.
If the content exists in the KB, but is updated, then Update the article.
If the content exists in the KB, but is not modified, Ignore.
If the content exists in the KB, but is no longer found in content source, then Expire the 'containing article' so that it's no longer available in search.

Incremental crawl vs. Full crawl

The synchronization process should be fine tuned based on how frequently the external content source is modified.

Adding and updating existing articles (Incremental) may be performed more frequently.
A Full crawl (which will include comparing existing content for deletes) should be done less frequently. The process would hit the source content system heavily, and hence some degree of planning and throttling would be a good idea.

Wrapper/Holding Article

The article that is created as a result of this process will need have a few other values that need to be set.

Knowledge Base	Set the Knowledge base with regards to the config object for that particular external source.
Category	Set the category based on the config object that maps an attribute of the source (like URL) to a particular category defined for that Knowledge Base.
Roles	The Roles for which the external content will be available to.
Language	The language under which the imported article will be available to.
Text Body	It's advisable to strip off the HTML from the external content and store only the text of the article within the wrapper article.
Meta	You may use an external service to generate meta that will be associated with that article.
Valid to date	The date until which this wrapper article will be available (recommended to keep it less, so that stale articles are not indexed).
Workflow State	Make sure that the article state is set to Published, to ensure visibility to all. Also don't set any workflows.
Click-through URL	The default Article URL needs to be replaced with the URL of the source. You will typically need to concatenate a base URL of the content source with the URL for that article.

I know some of you may request an update set for you to achieve this, but at this time we are unable to share it. We hope that this framework provides a guideline for you to implement external source searching for your instance.

=====Update April 21, 2018==========

In Kingston, we have released a feature that relies on the above methodology to search external content. See more details here Kingston Docs- External Content Integration for Knowledge

Consolidating External Sources into Knowledge v3 search

Scheduled Job

Configuration Object

Synchronization Logic

Incremental crawl vs. Full crawl

Wrapper/Holding Article

Platform Academy - July 22nd, 2025 - Prompt Engineering for Automated Tests Creation

Platform Academy - July 15th, 2025 - Code Signing

Ask the Experts: UI Builder