Join the #BuildWithBuildAgent Challenge! Get recognized, earn exclusive swag, and inspire the ServiceNow Community with what you can build using Build Agent.  Join the Challenge.

IntegrationHub ETL for inbound integrations - ImportSET API + JSON Column

KN
Tera Contributor

First of all this isn't a question but rather an answer which hopefully  provides some assistance for someone else struggling on how to trigger IH-ETL for inbound integrations (POSTing to Import Set w/ importsetAPI).

 

 

1. Create a data source with a type of Custom (Load by Script) and check that Data in single column is true.

 

2. Add a JSON-column in the script  like so  (the second parameter is the length of the field): "import_set_table.addJSONColumn('u_data',64000);" . Remember to "load all records" so the import set table is created.

 

3. Create your ETL (only provide basic details, 1st step).

 

4. Create an entry  REST Insert Multiples in sys_rest_insert_multiple where you'll select your source table (your newly created import set table) and make sure you'll select the correct Transformation type based on your payload and requirements, if you'll select asynchronous transformation type it'll the processing of the import set row won't happen automatically after the import set row is inserted but instead, based on my findings there's an scheduled job which runs every minute that'll trigger the transformation. The use case for using asynchronous method is that you'll payload contains multiple items. If you select the synchronous transformation type it'll transform start processing the import set rows immediately

.  Make sure that "Validate request" & "Use data source format" are selected. After you've saving your REST Insert Multiples-record add a column mapping with type "JSON" & Colum mapping as "Column name".

 

5. POST to your import set (example of the JSON body that should be used for insertMultiple from developer docs) : 

{
   "records":[
      {
         "<Column Label 1>":"<value>",
         "<Column Label 2>":"<value>"
      },
      {
         "<Column Label 1>":"<value>",
         "<Column Label 2>":"<value>"
      }
   ]
}

For POSTing the data to the JSON column the body should be like this(for singular item):

{
  "records": [
    {
      "u_data": {
        "my_key_1": "my_value_1",
        "my_key_2": "my_value_2",
        "my_key_3": "my_value_3"
      }
    }
  ]
}

For inserting multiple items the JSON body structure should be like this: 

{
  "records": [
    {
      "u_data": {
        "name": "FII",
        "description": "my pc"
      }
    },
    {
      "u_data": {
        "name": "FOO",
        "description": "my phone"
      }
    },
    {
      "u_data": {
        "name": "FUM",
        "description": "my tamagotchi"
      }
    }
  ]
}

 

6. After a successful POST you should be able to retrieve your newly created import set in ETL as a sample and the attributes from your JSON should be automatically extracted as mappable key-value pairs in ETL in a similar manner as it works with a setup where we're pulling CMDB data via GET methods.

 

JSON-columns add a dynamic layer for CMDB integrations because we don't have to worry about e.g length of our fields in import sets anymore and updating new attributes doesn't require much effort, intergration pushes a new field to the JSON columns and you can just open up ETL and retrieve the newest sample import set and the new field is there for you to map it.

 

I haven't had the time yet to investigate how to get the same setup working without insertMultiple but it's something that i'll try to check and update on the same post.

 

docs especially look for the paragraph "POST /now/import/{stagingTableName}/insertMultiple":

https://developer.servicenow.com/dev.do#!/reference/api/zurich/rest/c_ImportSetAPI#import-GET?navFil...

2 REPLIES 2

SteelStream
Tera Contributor

Dear KN,

Thank you for your insightful article on leveraging ServiceNow's ETL capabilities for inbound integrations. Your approach using JSON columns for dynamic data handling is particularly innovative. As I delve deeper into ETL solutions, I have a few questions to better understand the nuances and best practices associated with your described architecture.

  1. Data Validation and Error Handling at Ingestion: Regarding the custom data source script, what are your recommendations for validating the incoming JSON string? Specifically, should we implement checks within this script for JSON format validity, length constraints (beyond the u_data column definition), or character encoding issues to prevent data processing failures downstream?
  2. ETL Scripting vs. Transform Scripts for Field-Level Logic: My understanding is that once the JSON key/value pairs are extracted into the ETL workspace, field-specific transformations, validations, and additional scripting logic would typically reside within the ETL Definition's Transform Scripts. Could you confirm this understanding and perhaps elaborate on scenarios where scripting might still be necessary outside of the ETL Definition (e.g., in the data source script or as business rules)?
  3. u_data Column Length Limit: You specified a length of 64000 characters for the u_data JSON column (import_set_table.addJSONColumn('u_data',64000);). Is this the maximum practical or configurable length for such a column in ServiceNow, or is it a recommended best practice for typical integration scenarios? What are the implications of exceeding this value, or using a much smaller one?
  4. JSON Column Structure for Multiple Records: When ingesting multiple records via the insertMultiple API, how is the JSON column structurally composed? Specifically, within the u_data field of a single records entry in the POST payload, does it accommodate multiple CI key/value sets (e.g., an array of CI objects)? Or, as implied by your example ("u_data": {"my_key_1":"my_value_1", ...}), does each records entry (and thus each row in the import set) primarily represent a single CI's complete set of key/value pairs? Understanding this delineation would clarify how the number of importable records per u_data field is affected. (Edit: This question was answered in the article update.)
  5. Handling New Attributes and ETL Context: If an integration's key/value pairs are initially defined and associated with existing ETL mappings and error handling, what is the behavior when the source system introduces a new attribute not previously seen? Is there a risk of unintended automatic mapping or updates to other fields, or does each new attribute explicitly require manual intervention and mapping within the ETL context to be processed? This is crucial for managing schema evolution.
  6. Performance Considerations: Have you observed any performance bottlenecks or specific thresholds to be mindful of when processing very large JSON datasets within the u_data column or through the ETL framework, particularly regarding the parsing and extraction of key/value pairs?
  7. Versioning and Deployment of ETL Definitions: How do you typically manage versioning and deployment of ETL definitions and associated custom scripts across different ServiceNow instances (e.g., Dev, Test, Prod)? Are there specific approaches or tools you recommend for this?
  8. Alternative Data Source Types: While you've opted for a "Custom (Load by Script)" data source, have you explored or considered other data source types (e.g., REST, SOAP) for similar integration patterns, and what were the primary factors that led to your chosen approach?

Thank you again for sharing your expertise. I look forward to your insights on these follow-up questions.

KN
Tera Contributor

Hey-oh and thanks for your questions!

1.  Data validation & error handling is something that needs to be designed separately because the incoming JSON is inserted via import set API so as far as i know you can't design any validations on the data source side. If the structure of the JSON isn't correct or something is wrong it'll return an error in the response  e.g "Error in reading json stream parser". There's  work to be done for data validation & error handling on this solution because now it's really trusting that the "sender" has implemented checks that valid JSON is sent. Eager to see if someone has any ideas about this!

 

2. Yes, ETL consists of field-spesific transformations and possibility to create before/after scripts (scripting options are available from the Robust transformer). From my experience same use cases apply whether you’re using transform maps or ETL’s for your ingesting your data. For example for a domain separated environment you could create a business rule which forces all of the CMDB data on insert to a specific domain e.g should be the same domain as the CMDB record’s company is.

 

3. The column length is just an example from my earlier implementations for a GET-setup. When we’re retrieving or receiving data with the insertMultiple-parameter we can potentially receive hundreds of devices on a single JSON so that’s just an estimate from my side. I’m not entirely sure if the max length of the JSON field-type is documented but for example integer-type of field it’s “2,147,483,647” based on: https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB1362331
I suspect that the behaviour is the same for most of the field types in ServiceNow that if you’ll try to go past the maximum limit it’ll just automatically resets to the maximum limit. I haven’t yet encountered a situation (well because my column size is enormous :D) where your max length is smaller than the incoming JSON but I would presume that the JSON would be cut off from the part where your limit is reached. I Wonder if the structure of the JSON would be messed up if the message is bigger than the column or will it just cut off the key-value pairs.

 

5. If a new attribute is introduced  there’s no automatic mapping. When the new JSON is sent you will have to open ETL and retrieve the import set where the newest attribute is and do the operations (mapping,transformations) needed.

 

6. Same principles should be followed as with other manners of transforming/extracting data. The more complex and unoptimized your field transformations/scripts are per row it’ll be multiplied based on the items within your payload.

 

7. Capture all of your ETL configurations in an update set. If for some reason the your ETL configurations aren’t automatically captured navigate to cmdb_inst_application_feed.list and add your ETL configs to update set from there.  

 

8. I’ve created a few file-type (format JSON)  Data sources for inbound integrations. The limitation with this is that when a new attribute is introduced from the integration you will have to adjust your JSON file to include the new attribute and do the test load from the data source, then you’ll have to check that the attribute is created as field in your import set and update the length of the field and adjust the mappings in ETL. While using this solution you don’t have to worry (to a certain point) about the length of the fields because you only have one. Primary factor for my approach was that I knew from experience that the same setup works for GET-integrations perfectly so why wouldn’t it work for POST also and i wanted to replicate the same solution so architecture-wise it’s easier to understand and implement.

 

Cheers!