How to prepare your data before creating IH-ETL transform map

Siyou Li · ‎11-08-2021

Previous Step

Before creating an ETL transform map, we need to first insert the raw data into ServiceNow platform by using standard data source function. Then, the ETL transform map associated with this data source will process data inserted by this data source.

If you would like to read examples about how to create a data source, please go to this article.

IH-ETL supports two types of data source – flat and nested

Flat – Format such as Excel, CSV

Nested – Format such as JSON, XML

Example flat data and how to set up

Example nested format and how to set up

Video for example nested payload data source set-up

IH-ETL nested payload demo 0:00 - 4:00

Modeling Guidance for Related Data

Often your third-party API provides data that is not supported in the existing CMDB data model. When this occurs, there are several options:

Normalize the data and use an existing field

Create a custom extended class

Create a custom field on an existing class

Map the data to a new related table

Normalizing the data to the existing data format and using an existing field enables the most functionality. Any ServiceNow applications that use the existing fields do not require modification to use your data.

Creating an extended class can adversely affect Discovery and data from other sources. If your class is reclassified, your custom fields are lost. For example, if you have a machine in My Custom Computer Class and Discovery reclassifies it to Linux Server it gains the fields of the Linux Server class but loses your custom fields. If you choose to create a custom extended class, your app requires additional approval from the ServiceNow Model Council.

Creating custom fields on a parent class increases installation time because the fields are added to existing tables. It can also pollute the CI model and trigger unnecessary changes to the core CI. If you choose this option your app needs additional approval from the ServiceNow Model Council.

The final option is to create a custom table with whatever fields best model the source data. This does not impact existing CIs, does not require custom columns on source data, and generally provides the cleanest model. This guide outlines that approach.

Below is an example use case:

In this example, the computer data from the API source includes the following fields:

"last_boot": "2020-02-20 13:22:23",

"last_login_datetime": "2020-02-19 01:56:05",

"last_login_user": "john.adams",

"agent_version": "1.10",

"vpn_enabled": true,

"last_scan_for_threats": "2020-02-24 03:13:29"

There are no existing fields for this data. In this example, this data is monitored by an agent named Tutorial Agent. To store the new fields in the CMDB, store them in a new table and relate that table to the CI.

This example uses the following schema to persist the data:

Tips and Caveats

Which type to choose? Flat or nested

Rule of thumb, if raw data has “containment” or “association” connection between each other, then nested format would be best, for example, each computer has several network adapters associated.

However, if you already have flat format but find out your data has nested feature, it is possible to represent nested data using flat format by duplicate data rows. Below is an example scenario – each computer has two network adapters associated with.

Use nested format (JSON)

Use flat format (CSV) - because this computer has 2 adapters, we need to duplicate this computer data twice

Data source preparation checklist

Pull data source to make sure it is working fine before using it in IH-ETL

20 records test load import sets are not supported by IH-ETL. Load all records pull is required

Make sure data source contains records. Data source pull without any record will cause “There are no records associate to the import set” error.

Data source’s application scope must match the target ETL transform map’s application scope

1 data source can be mapped to only 1 ETL Transform map or 1 Table Transform map

Nested format must mark “Data in single column” as true when first time setting up data source

When using a nested data JSON payload, the following restrictions apply:

Field names must start with a letter (between A-Z or a-z) or with ‘_’, and must only contain letters (between A-Z or a-z), digits (0-9), or the ‘_’ character

For example, a field name can't contain special characters such as *, [,], #, $, spaces, and dot

Field names can't be “temp” or “object”, which are reserved for internal use

Consistently throughout the payload, you must use an array or an object to represent data in a specific level, regardless of the number of items in the level. If you use an array for multiple items in one object, you must also use an array to represent a single item in other objects

Additional Resources

Landing page

IH-ETL common error scenarios

Next Step

How to set up an IH-ETL transform map