Knowledge articles import - from an SQL server

Anish Reghu · ‎09-03-2023

An SQL server was getting decommissioned and it was required that we complete the migration of all articles held in that SQL server at the earliest and start using them in ServiceNow. Here's what we followed and this finally helped us import a total of 10,742 articles in total.

Analyzing the source:

Content and media were required.
Owner of SQL server was reached.
Content and media were in two different data models.
Schema was fetched from the SQL server owner in an excel.
Content had a unique identifier named “lbd_id”, media did not have one.
A unique identifier was devised for media.
Unique identifier for media – Filename + hash_id

Checking existing resources:

Relevant transform maps were identified (existing) but of no use.
The existing transform maps were specific to an import from a different source.
Staging table for attachments was not yet identified or was to be created.

The middle-ware - Orchestra

Access to the import tables in SNOW was provided to an Orchestra technical user.
Error handling in Orchestra was yet to be identified.

Staging table

Used OOTB attachment API for storing attachments.
Proposal to have a column to maintain the logs of any operation done on imported records was accepted.
It was ensured that the staging table is an extension of the Import set Row table to avoid the data import size limit of 8K payload.
WSDL for staging table was created, shared with Orchestra.

The actual import

The web service

Scripted web service was suggested as medium of import – Import set table was suggested instead by the architect to utilize OOTB options.
REST web service was proposed as the medium to fetch data.

Reasons for choosing REST:

[SOAP] needs more bandwidth while REST doesn’t need much bandwidth
[SOAP] requires processing of the content, while [REST] doesn’t need any such processing.
[SOAP] is intolerant of errors and the XMLs used in it can get extremely complex.
[REST] can use four different HTTP 1.1 verbs (GET, POST, PUT, and DELETE) to perform tasks. [SOAP] however need not necessarily use HTTP, can use SMTP or any other transport.

The error handling

REST does not have built in error handling.
Hence error handling in Orchestra should be identified.
Scenarios identified for error handling - invocation of endpoint fails from ORC to SNOW.
We used OOTB HTTP error codes to do the error handling.

Mandatory fields for import

A unique ID of article from the source, the title of the article and the content were made mandatory for the import of data/ content to staging table.

Mapping attachments

Proposal – Use “hash id” + “filename” for attachment mapping into staging table.
To keep the attachment name unique, an attribute of 36 characters was identified as the unique ID – field name - <ID_filename>
The ID can be found on every article’s source code for troubleshooting.
So, attachment needs to be checked by this ID and mapped.

Replacing the URL path on attachment

The attachment comes in the form of an URL.
The URL contains a source path which should be replaced with the local ServiceNow instance path and sys_id of the attachment.

The dummy article

To do the above activity, we need to store the attachment in ServiceNow.
We used a dummy article protected by a before Delete BR to prevent accidental deletion of the article.
The sys_id of the article was stored in a system property and referenced in the BR.
The backup XML was stored in the story where the idea of dummy article is devised.

Positioning of the attachment in article

If an article has multiple attachments at source, then the path needs to be replaced with proper URL such that the positioning of the attachments in the article remains undisturbed.
URL format was stored in a regex to use that and identify the proper positioning of attachments.
Once identified, the path of the source would be first replaced with sys_attachment table for attachments and later on linked to the target article.

Methods used to push data:

Batch import was suggested because of bulk data. (12000+ articles)

The batch import was suggested only for the initial load.

insertMultiple’ was the function call for doing the batch import.

insertMultiple was used as the preferred method since calling the web service per article would have a significant impact on the system performance.

Parsing of the article:

Ensure that the payload from Orchestra is of proper XML format for parsing to progress smoothly.
Article from SQL system was imported as plain HTML formatted content (but filled with third party system specific HTML attributes.
The HTML content is assumed to be well formatted (according to XML standards) to parse it via DOM API and find the relevant tags of interest.
OOTB XMLDocument script object is reused to script parse the HTML content and navigate via DOM structure.
Reference - https://docs.servicenow.com/bundle/jakarta-application-development/page/script/server-scripting/conc...
The unique identifier at source will be the first to be parsed from XML payload and mapped to the relevant field on staging table.

NOTE – The import of data and attachments should take place before parsing and correlation takes place.

After Import

The state of the article changes from Draft to Published.
State of the relevant article in the staging table. Draft --> Processed.
Article has an attachment, but for some reason doesn’t process it to the target table. Staging table and Article on kb_knowledge table in Draft.
Staging table was updated using the UI messages for stages like:
- Article is created,
- State in Draft,
- Article is updated,
- Articles without attachment,
- Articles with attachment,
- Path replacement,
- State changes to Processed,
- State change to error on staging table,
- Error messages for Mandatory fields on staging table.
A field was dedicated on the staging table to store the Target KB article number. This field would now be updated on successful import post the acknowledgement from Orchestra.

Cleanup jobs:

The staging table was scheduled to be cleaned every 30 days to ensure we do not hold data which is already processed.
It was carefully validated that every record being deleted is fully processed at the target table, older than 30 days to identify the proper records for deletion.