Instance Synchronizer

Tom Hauri · ‎06-12-2024

Instance Synchronizer

Welcome to Instance Synchronizer - created and maintained by Tom Hauri.

Overview

This simple application lets you synchronize data between ServiceNow instances by pulling data via REST using JSON Configuration Manager to handle the configuration, Mapping Configuration Manager to manage data mappings and GlobalTools (WriteValidate) library for specific configuration, validation, mapping and error handling.

Disclaimer

Instance Synchronizer is NOT an officially supported ServiceNow product. Instance Synchronizer does NOT come with any kind of warranty. If you use it, you own it!

System Requirements

Vancouver or later
Access to https://github.com/haurit/now-instancesync

License

Licensed under the Apache License, Version 2.0 (the "License")

You may not use Instance Synchronizer except in compliance with the License.

You may obtain a copy of the License at: https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Release Notes

1.0.0 - 06.2024

Initial Release

1.0.1 - 08.2024

Allow dedicated job configuration for mapped tables including query, delta query, chunk size and priority
Simplify usage of table configurations and use initial JSON Name or sTableJSONPrefix with or without "_" delimiter
Simpler scheduling options with iRunInterval without weekday configuration
Simpler scheduling options with sRunStartTime and aRunStartTimes options without oSchedules configuration
Added related list to Config with all related JSON files (kudos to Stephan Pfell for the great idea!)

1.0.2 - 09.2024

Added DB indexes on Status field for all tables
Added Table Cleaner for Job Log Record Error Table (clean warnings after 7 days)
Fix an issue with the daily scheduling (sRunStartTime)

1.0.3 - 10.2024

Fixed a small issue where failed record updates where counted as successful
New bLogErrorMappingToLastJobLog parameter (default = true) that mapping error entries which are unique per field/value are always linked to the last Job Log where they occurred (before it always created one per Job Log)
Dynamic Query respects bUseDeltaLoadQuery with new parameters aDynamicDeltaLoadQuery and aDeltaLoadQuerySysIds

1.0.4 - 11.2024

Add additional parameters bUpdateCompareAttachment and bDeleteCompareAttachment for additional granularity on the compare attachment handling
Improve error handling on initiate job with wrong configuration

1.0.5 - 12.2024

Add parameter for sQueryOrderBy and new default value for sorting the rest query to avoid losing data. The sort is on Created on [sys_created_on] and SysId [sys_id] fields. Please ensure that the source table has an index on Created on [sys_created_on] to improve performance of the REST query.
Chunks are now triggered in descending order of the chunk to prevent loss due to updating records.
Fixed issue with orphaned attachments on record class change.
Added "timer" system property. If set to true will write detailed performance information in an attachment on each chunk.
Optimized performance by caching (transaction cache) values that can be reused for the same class.

1.0.6 - 01.2025

Improved simple configuration (oTables) support
Updated documentation for simple configuration

1.0.7 - 03.2025

Improved status setting when no task is found for a job run
Fix bug where "Run after this job" was not triggered when initial job had nothing to do
Fix edge case with multiple Class Maps

1.0.8 - 06.2025

Added sQueryOverride parameter for easier testing and ad-hoc loads
Added bDisplayValues parameter to fetch records including display value

1.0.9 - 09.2025

Fix health scan findings
Improved error handling for REST count

1.0.10 - 11.2025

Added parameter sDynamicQueryScript to allow external calculation of required SysIds

Problem Statement

Traditional ways of synchronizing data between instances (IDR, RTE, simple REST, Table API) have some distinct disadvantages, specifically if you want to transfer big amounts of data or huge numbers of tables:

Some technologies are not available for On-Premise / Self-Hosted customers
Size limitations on the different technologies (max. initial seedings etc.)
Each table requires distinct configuration which leads to massive configuration effort (for example if you want to synchronize "the CMDB" with it's thousands of tables and ten thousands of attributes)
Limitations on the different technologies restrict the ability to detect and correct data quality issues

Overview

Instance Synchronizer allows you to have one central place to handle your instance synchronization with a minimal amount of configuration required if the two data models match. If there is a need to map classes or fields differently, the tool lets you do that very granularly and in a inheritable way so that for each table in an hierarchy you only have to define the specific mapping once and all extended tables will pick it up automatically unless specifically overwritten at the lower level.

The other major advantage of the solution is that it scales to massive amounts of records and has very detailed log files that show exactly what the current state is, where errors occurred and how to overcome them. It has also built in functions to overcome a lot of common problems like receiving 429 errors (source is overloaded) or too big response objects forcing the use of smaller chunk sizes.

The very granular configuration options of the Global Tools library allow you specify exactly how you want to handle data quality issues, including the option to use Mapping Configuration Manager to handle data mappings efficiently.

Prerequisites

The Instance Synchronizer makes usage of the Global Tools - YAGUSIS , Mapping Configuration Manager, JSON Configuration Manager applications. Ensure that you have successfully installed the applications before running the instance synchronizer job.

If you require to transport deletes please have a look at EBIS (GitHub).

Installation

Create an account on GitHub - if not done already.
Create a personal access token for your GitHub account.
Add credentials to access GitHub - use "Basic Auth" (use GIT user id and personal access token as password).
Fork the repository https://github.com/haurit/now-instancesync.
Go to Studio and import the Instance Synchronizer application from source control.

Usage

Create a new Connection in the "x_snc_instanc_sync.Instance_Sync_Remote_Instance" Connection & Credential Alias to connect to the remote system:

The Instance Synchronizer will use the Table API to retrieve data from the remote system, thus the credentials used must have the rights to use the Table API and all the tables that you want to retrieve.
Create an initial main configuration file for your copy job

The available "job" attributes are documented below in detail, the "config" attributes relate to how the GlobalToolsRecordWriteValidate script functions and is documented under the parameter oRecordConfig there.

You can leverage a nested structure of JSON Configurations to overwrite the behavior of both "job" (partial parameters) and "config" (all parameters) for each table in a table hierarchy (optional, you must only define it if you need to change it). Please ensure that the inheritance on table level is done correctly, as the scripts will pick up the highest level table definition and rely on the JSON Configuration inheritance feature.
Create and configure the Copy Job
Start the job to test your configuration

Use the job configuration parameter "bCalculateOnly" below to validate the setup without actually copying data by setting the Job Log Table Chunks to "Cancelled".

Use the "Start Job" UI Action to start a new job (visible in the related list). Please note that the creation of the job itself is scheduled in the background (if you synch the whole CMDB, it takes a while to query all the tables and determine what should be copied). Once the tables and chunks have been created, you can use the "Cancel Job" UI Action to put the job and the job table to cancelled as well.
Once you have validated that the job actually would copy the right data you can set the "bCalculateOnly" to false (or comment it out) and start the job for real.
There are 3 ways to start a job:
1. Manually with the "Start Job" UI Action
2. By ticking the "Scheduled" flag and setting the oScheduleCfg correctly so that the job is automatically scheduled according to your needs.
3. By linking the current job to another job using the "Run after this job" reference field. A job will automatically be created once the job defined in "Run after this job" has been completed.

Job error handling

When a job fails, it usually has one of the following reasons and remediations:

None of the tables defined have any valid data returned by the query defined. This also happens quite frequently during testing when you delta queries and actually no data has been changed. There will be only an error in the log file if it's the first load. For obvious reasons (it valid to happen) there is no error when we do delta loads.
The number of records is not what you expect. In this case use the option to calculate the tables only without actually transferring data.
The configuration is invalid and an error is returned (see the error code list below). In order to trouble shoot these there are a couple of options in the configuration that would store the actually used JSON files on the tables and the chunks.
There is an error on the source system, in which case the system will try again later (how many times is defined according to the system property below).
If one of the above errors happen you can try manually with the "Restart Chunk" UI Action.
For most data related errors (on individual records) the use the Global Tools (YAGUSIS) documentation for the WriteValidate function.

Job Configuration

The job configuration can be done in two ways, a simple configuration with only one file per job or an enhanced configuration that allows for more flexibility and inheritance to define a base config (for example for "cmdb_ci" with the possibility to override settings for child tables, for example "cmdb_ci_hardware") which is build up based on multiple JSON configuration files.

The documentation is not yet as easy to consume as I would like it to be, please don't hesitate to write a comment to ask for help with the initial setup!

Simple Configuration

In the simple configuration all tables are configured directly in the "main" JSON attached to the Job Configuration.

var oCfg = {
	job: {
		bCalculateOnly: true, // Do not actually copy data, just prepare the tables and chunks for validation
		iMaxClassSize: 10, // Max records copied for testing purpose
		bDeltaLoad: true, // After the first job completed, add a delta query (start time to end time on the sys_updated_on field)
		bDeltaCompareAttachment: true, // Use the compare attachment feature to speed up full loads
		aDeltaCompareRemoveFields: ['last_discovered', 'sys_updated_by', 'sys_updated_on', 'sys_mod_count'], // Remove these fields from the compare attachment check
		bTableAutoExtend: true, // Automatically extend root tables (for example cmdb_ci will create copy jobs for all CMDB tables that exist on the target system and have data on the source system)
		iChunkSize: 500, // Inital chunk size (default: 2000)
		oTables: { // Simple table configuration
			core_company: { // Copy the company table
				sQuery: 'nameNOT LIKEACME', // Copy only records that are found with this query
				bTableAutoExtend: false, // Do not auto extend
				config: { // Override the config for this table
					sAction: 'update', // only update records, do not insert new ones
				}, // config override on job (or config)
			}, // companies
			cmdb_ci: { // Copy the cmdb table
				sQuery: 'name!=All^ORname=NULL', // Copy only records that are found with this query
			}, // CIs
		}, // minimal table config
		oScheduleCfg: { // Schedule configuration
			sRunStartTime: '15:10', // Use this to schedule a run once a day, for more complex scheduling please see advanced configuration
		}, // scheduler
	}, // job
	config: {
		bKeepSysId: true, // Keep the sys_id from the source system
		oErrorHandling: { // Dedicated error handling setup
			oErrorType: { // reject = default (will override all other error handling settings, so be careful)
				value: 'warn', // Invalid value object will only create a warning
			}, // error type
			oFieldType: { // Define error handling based on field type
				reference: { // Reference fields
					sErrorHandling: 'mapping', // Use the Mapping Configuration Manager if installed
					oByReferenceTable: { // For these reference field target tables, override the setting
						sys_user: 'warn', // Only create warning error messages for missing users
						alm_asset: 'ignore', // Don't create error messages for missing assets (will also work for extended tables)
					} // user table
				}, // ref type
				choice: {
					sErrorHandling: 'mapping', // Use the Mapping Configuration Manager if installed
				}, // choice type
			} // field type
		}, // error handling
		sAction: 'save', // Allow insert and update actions
	} // record config
};

Enhanced Configuration
In this configuration the setup has multiple JSON configuration files. The "main" configuration is linked to the Job Config records, whereas the specific table configurations are found by naming conventions (either the name of the main configuration plus the table name which can be any part of the hierarchy or the name from the job attribute

"sTableJSONPrefix" plus the table name).

As described about the JSON configuration has two root attributes:

"job" attribute definition

///////////////////////////////////////////////////////////////////////////////////////////////////////////
// DEFAULT SETTINGS START --> can be overwritten in table specific configuration
///////////////////////////////////////////////////////////////////////////////////////////////////////////
bLog: true, // (Optional) Default: false, will add additional logs depending on sLogLevel defined.

sQueryOverride: 'sys_id=', // Overrides the sQuery and sDeltaLoadQuery from the table configuration. This is useful when you have a Adhoc Load configuration and want to load only specific data, but use the same configuration as in the main job.

bTableAutoExtend: true, // (Optional) Default: false, if true will automatically create jobs for each table extending the initial table. Please note that all tables share the same "job" configuration as the initial table.

bForceClassQuery: true, // (Optional) Default: false, if true will force strict class query even if on the target the table is not extended (but on the source there are extended tables)

bRunIREAgainAfterInsert: false, // (Optional), if set to true will run IRE again after insert to populate Multisource CMDB if bKeepSysId and bUseIRE are true, because IRE does not allow to keep the sysid so the script will automatically use regular insert.

bDeltaLoad: true, // (Optional) Default: false, if true will add the delta timestamp query from the last job start time

bDeltaCompareAttachment: // (Optional) Default: false, if true will add a hidden JSON attachment to each record and use the hash to detect changes

bUpdateCompareAttachment: // (Optional) Default: false, if true will save the compare attachment even if bDeltaCompareAttachment is false.

bDeleteCompareAttachment: // (Optional) Default: false, if true will delete the hidden attachment when both bDeltaCompareAttachment and bUpdateCompareAttachment are false.

iDeltaLoadOverlapSeconds: -2, // (Optional) Default: -1, seconds where the last load and the new load overlap to take into account database lags when writing (default: -1)

bUseDeltaLoadQuery: true, // (Optional) Default: false, if true will use sDeltaLoadQuery instead of sQuery during delta load. This is for specific usecases, where you want the same config "in reverse" to catch updates to the state that you exclude.

sDeltaLoadQuery: '', // (Optional), Default: empty, used in combination with bUseDeltaLoadQuery to provide an alternative query, for example for update only changes

sDeltaLoadStartTime: yyyy-MM-dd HH:mm:ss, // (Optional), if set will use this time in the initial load as the delta load start time (format is mandatory, will use the system user timezone).

aDeltaCompareRemoveFields: ['sys_updated_by', 'sys_updated_on', 'sys_mod_count'], // (Optional) Default: [], will remove the fields from the JSON when using bDeltaCompareAttachment

sDeltaCompareFileNamePrefix: 'instsync', // (Optional, Default: 'instsync', if set will take this as the prefix for the hidden attachment file name

sQuery: '', // (Optional), default query to be applied against each table to fetch the records

bDisplayValues: true, // (Optional), default false, if true will use "all" parameter to fetch REST object including value and display_value attributes. Allows the usage of "bDisplayValue" in record field configuration.

sQueryOrderBy: '', // (Optional), Default ORDERBYsys_created_on^ORDERBYsys_id, order by query to be applied (ORDERBY or ORDERBYDESC + <field>)

aDynamicQuery: [ // (Optional), Setting a dynamic is recommened only for initial loads. It ignores the sQuery, iChunkSize, oClassMap, bDeltaLoad, bTableAutoExtend etc. parameters
	{
		sTargetType: 'remote', // local or remote
		sTargetTableName: 'table_name', // table to query
		sTargetQuery: 'query_to_be_applied', // query to be applied to the table
		sSourceIdField: 'aggregate_field', // field that contains the sys_id of the record to be transferred
	},
]

aDynamicDeltaLoadQuery: // (Optional), Same as aDynamicQuery but taken when bUseDeltaLoadQuery is true

aQuerySysIds: ['sys_id_1', 'sys_id_2'], // (Optional), same as sQuery "sys_idINsys_id1,sys_id2 but using the dynamic mechanism and storing the condition on the chunk (not the table).

aDeltaLoadQuerySysIds: // (Optional), Same as aQuerySysIds but taken when bUseDeltaLoadQuery is true

sDynamicQueryScript : '<script_include.function(oParams)>', // Script include to be called. oParams contains variables with sTableName, aTableExtensions, sBaseTableId, grJobLog and oConfig information.

iChunkSize: 2000, // (Optional) Default: 2000, chunk size to be applied for the rest call

iMaxClassSize: 10, // (Optional), Default: empty, if set will only take this number of records per table/class

bCalculateOnly: true, // (Optional), Default: false, if true will set the chunks to completed so they are not actually executed

bAttachTableJobJSONConfig: true, // (Optional) Default: false, will add a JSON attachment with the job config for each table when the table is created

bAttachChunkRecordJSONConfig: true, // (Optional) Default: false, will add a JSON attachment with the record config for each chunk  at runtime of the chunk sync

bAttachChunkJobJSONConfig: true, // (Optional) Default: false, will add a JSON attachment with the record config for each chunk at runtime of the chunk sync

///////////////////////////////////////////////////////////////////////////////////////////////////////////
// MAIN SETTINGS START --> cannot be overwritten in table specific configuration
///////////////////////////////////////////////////////////////////////////////////////////////////////////

aTables: ['table_one', 'table_two'], // The tables to be copied. If bTableAutoExtend is true, will take ALL the tables that extend this table (whole hierarchy), so for example you can just specify "aTables: ['cmdb_ci']," and "bTableAutoExtend: true", to copy the whole CMDB. The script will automatically break down the tables in the Job Log Tables so that they are copied individually respecting all the valid fields per table.

sTableJSONPrefix: 'TestCMDBSync_', // Optional, default Initial JSON Name, _ as delimiter is automatically added if required.

iPriorityDefault: 300, // default priority if no specific priority is defined (default: 1000, max: 100'000)

oClassMap: { // for each table in the list the respective source table will be mapped to the target table, multiple source tables can be mapped to the same target table. You need to map fields in the record field mapping.
	u_custom_source_table: 'target_table', // source to target table mapping
}, // class mapping

oPriorityClasses: {
	25: ['core_company'],
	50: ['cmdb_model'],
	100: ['cmdb_ci_cluster', 'cmdb_ci_computer'], // computers and extended classes are referenced in other classes, clusters before cluster nodes
	200: ['cmdb_ci_apache_web_server', 'cmdb_ci_app_server_jboss',], // principal classes
	// 300: this is the default, so any table not explicitly mentioned will get this
	400: ['cmdb_ci_ip_address'],
	500: ['cmdb_ci_network_adapter'],
	600: ['alm_asset']
}, // priority classes

sRunStartTime: '03:00', // Optional, default 02:00,
aRunStartTimes: ['03:00', '15:00'], // Optional, default 02:00,
oScheduleCfg: { // Optional: if "schedule" is checked in the configuration and nothing is defined, script will run daily at 02:00 in the morning.
	sRunStartTime: '03:00', // Optional, default 02:00,
        aRunStartTimes: ['03:00', '15:00'], // Optional, default 02:00,
	iRunInterval: 1440, // Optional, will override sRunStartTime and aRunStartTimes and week day specific configuration
	oWeekendDays: { // Optional, if you need more specific schedules on weekend days
		iRunInterval: 120, // only one job on SAT/SUN
		aBlockedHours: [
			{
				sStartTime: '00:00',
				sEndTime: '21:00'
			}, // do not run during the day
			{
				sStartTime: '22:00',
				sEndTime: '23:59:59'
			}, // do not run in the night
		], // hours where schedule is not running
	}, // config for weekend
	oWeekDays: { // Optional, if you need more specific schedules on week days
		iRunInterval: 60,
		aBlockedHours: [
			{
				sStartTime: '06:30',
				sEndTime: '08:30'
			}, // do not run in the morning when people login
			{
				sStartTime: '20:00',
				sEndTime: '23:59:59'
			}, // do not run in the night when other jobs are running
		], // hours where schedule is not running
	}, // config for weekdays
} // schedule cfg
// Error handling
bLogErrorDetails: true, // (Optional) Default = false: will log the error details in the error map table

aLogErrorDetailTypes: ['value'], // (Optional) Default = ['all']: available values see GlobalTools documetation

bLogErrorMappingToLastJobLog: false, // Optional (default = true), if false will create a new mapping error record for each unique field (incl table ref)/value combination and link it to the current job, if true will only map it against the last job where it was found.

"config" attribute definition
The GlobalToolsRecordWriteValidate Script Include function "validateAndWrite" will be called by the Instance Synchronizer for each record (JSON of the record is passed as oRecordValues) and the "config" attribute will be passed as oRecordConfig. Please also find detailed about the returned errors in the Global Tools - YAGUSIS community article.

Properties

x_snc_instanc_sync.available.triggers The maximum number of concurrent triggers to be created that will run the data jobs.
x_snc_instanc_sync.max.retry The number of retry attempts to connect to the remote instance for a specific chunk of data.
x_snc_instanc_sync.scheduler.enabled If true the scheduler will pick up jobs and create runners, if false will do nothing.
x_snc_instanc_sync.logging.verbosity Control logging verbosity as described in KB0714743.
x_snc_instanc_sync.timer Adds an attachment with detailed performance information to each chunk record.

Error codes

-97 Try catch error.
-98 No SysId in payload.
-99 No error details.
-100 Error loading copy job table or copy job.
-101 Error loading copy job config or table name empty.
-102 Error scope is not writable.
-103 Undefined error
-104 Undefined error during rest query.
-105 Error string size to big.
-106 Config not found.
4xx Errors with positive number are coming directly from the REST call to the Table API.

Future ideas

Provide dashboards with load statistics
Detect indexes with "unique keys" constraints when insert fails without generating proper error messages.

Instance Synchronizer

Instance Synchronizer

Overview

Disclaimer

System Requirements

License

Release Notes

1.0.0 - 06.2024

1.0.1 - 08.2024

1.0.2 - 09.2024

1.0.3 - 10.2024

1.0.4 - 11.2024

1.0.5 - 12.2024

1.0.6 - 01.2025

1.0.7 - 03.2025

1.0.8 - 06.2025

1.0.9 - 09.2025

1.0.10 - 11.2025

Problem Statement

Overview

Prerequisites

Installation

Usage

Job error handling

Job Configuration

Properties

Error codes

Future ideas

Now Create Retirement FAQs and Introduction to the Best Practices site

Data at the Core—Integrations, Federations, and Workflow Data Fabric

Making use of AI Skills: Problem Affinity