Exporting to MySQL, Oracle or SQL Server with DataPump

Giles Lewis · ‎04-24-2021

DataPump is a contributed application which can be used to export ServiceNow data to MySQL, Oracle, Microsoft SQL Server or PostgreSQL. This application has two parts:

A scoped ServiceNow app (x_108443_sndml) which is used to configure and manage export jobs. This app can be downloaded from the ServiceNow Developer Connect Share site. Go to https://developer.servicenow.com/connect.do#!/share/user/content and search for "DataPump". The app must be installed in the ServiceNow instance from which the exports will be sourced.
A Java application (a.k.a. agent) which runs the exports. The Java app can be downloaded from https://github.com/gflewis/sndml3/releases. The ZIP file includes separate JAR files for MySQL, Oracle, Microsoft SQL Server and PostgreSQL. The Java app can be run from a command line prompt on a Linux or Windows server.

This article describes the steps to configure DataPump.

There are two additional articles in this series:

Create a Daemon User and Grant Roles

After installing the Update Set and downloading the JAR file, the first step will be to create a new ServiceNow user with the User ID "datapump.daemon". This account will be used by the Java agent to communicate with the ServiceNow instance.

Assign the user a randomly generated password
Set the user's time zone to GMT
Grant the role x_108443_sndml.daemon

The daemon user requires "read" access to ServiceNow tables which will be exported. If there are existing roles (such as "itil") which grant this access, then grant those roles to the daemon user. Otherwise, you will need to configure ACLs which grant the necessary read access to the daemon user.

If you are testing in a PDI or sub-production instance, then you can grant "admin" role to the daemon user. However, this is not recommended in production.

Users with x_108443_sndml.admin role can configure and monitor DataPump jobs. However, only a ServiceNow administrator can grant the daemon user permission to read ServiceNow data tables.

Create a Database Agent Record

In your ServiceNow instance, go to DataPump > Agents and click New. Create a new Database Agent record with the name "main".

Create a Connection Profile

The connection profile is a Java properties file that contains credentials for the database and the ServiceNow instance, as well as other parameters that affect processing. The connection profile looks like this:

servicenow.instance=dev00000
servicenow.username=datapump.daemon
servicenow.password=secretpassword
datamart.url=jdbc:mysql://name-of-database-host/myschema
datamart.username=datapump
datamart.password=secretpassword
datamart.schema=myschema
datamart.autocreate=true
daemon.scope=x_108443_sndml
daemon.agent=main
daemon.interval=120
daemon.threads=3
daemon.continue=false

Since the connection profile contains passwords, it should be in a secure location on your Linux or Windows server.

Please note the following:

datamart.url format will vary based on whether you are using MySQL, PostgreSQL, Oracle or Microsoft SQL Server. Please refer to the documentation on configuring a JDBC URL based on the type of your database.
datamart.autocreate causes tables to be automatically created in the target database if the tables do not already exist.
daemon.agent must match the agent name created in the prior step.
daemon.interval causes the agent to check for new jobs every 120 seconds.
daemon.threads is the maximum number of jobs that the agent can process concurrently. Additional jobs will wait until a thread becomes available.
daemon.continue controls whether or not the agent will continue running after encountering an IO error in communication with ServiceNow or the database. If “true” then it writes an error to the log and goes back to sleep. If “false” then it will immediately abort.

Be sure to verify in advance that the datamart user has the ability to create new tables in the datamart schema.

For additional information on the connection profile refer to https://github.com/gflewis/sndml3/wiki/Connection-Profile.

Run the Java Application

There are two ways to run the Java application:

--scan will cause it to run once and terminate. The "interval" and "continue" properties are ignored.

--daemon will cause it to run in an endless loop until interrupted by a Ctrl-C or a signal. The app will periodically poll ServiceNow for new jobs based on the interval.

You can run the application from the command line using the following command:

java -jar <jarfilename> -p <profilename> --daemon

<profilename> is the name of the connection profile created above.

<jarfilename> is one of the following:

sndml-3.4.1.mysql.jar - MySQL
sndml-3.4.1-pg.jar - PostgreSQL
sndml-3.4.1-ora.jar - Oracle
sndml-3.4.1-mssql.jar - Microsoft SQL Server

After starting the agent app, view the previously configured Database Agent record in ServiceNow. Confirm that the Last Check-in date has been updated.

Configure a Database Table and a Job

For initial testing, choose a ServiceNow table which has a relatively small number of rows.

Go to DataPump > Agents.
Open the "main" agent configured above.
Click the New button above the Tables related list.
Select a Source table.
Save the record.
Click the New button above the Jobs related list.
For Action type select "Insert".
Save the record.

Run a Job

During normal operations, jobs are run via schedules. However, for testing or initial loading we can run jobs manually.

Open the Job record configured in the prior step.
Click Execute Now.
Note that a new Job Run record is created. The Status of the Job Run is initially set to "Ready".

When the Java agent next runs, it will detect any Job Runs with a Status of "Ready" and change the Status to "Running". If the Java agent is not already running, then run it now.

Verify the Data

Use any SQL query tool to verify that the table has been created in the designated schema and that it contains data.

Action Types

There are several types of jobs.

Insert

"Insert" is used for initial loading or reloading of SQL tables. It inserts rows into the target table. If a record with the same sys_id already exists in the target table, then a primary key violation will occur and the row will be skipped.

If Truncate is checked, then the SQL table will be truncated prior to the load.

Upsert

"Upsert" is used to load or update SQL tables. If the target record exists (based on sys_id), then it will be updated. Otherwise, it will be inserted.

If Since Last is checked, then only records inserted or updated in ServiceNow since the last run will be processed. The following filter will be used when retrieving records from ServiceNow:

sys_updated_on>=lastrunstart

where lastrunstart is determined from the "Last Run Start" field on the Database Table record.

Sync

"Sync" compares the timestamps (sys_updated_on) in the source and target tables. Based on this comparison it will insert, update or delete target records. If the values of sys_updated_on match, then the record will be skipped.

If a Filter has been configured for the Database Table, the Sync will delete any records which do not match the filter.

Create

"Create" creates a new empty table in the SQL database. If autocreate has been set to true in the connection profile, then this should be unnecessary.

Execute

"Execute" executes an arbitrary SQL statement. This is typically used to run a database stored procedure.

Jamsta1912 · ‎03-23-2022

Hi Giles,

I'm part way through setting this up. I have Datapump installed in a DEV instance and will be working with some colleagues shortly to set up the agent / target DB, ready for some testing. So far, it seems like a neat, intuitive solution, easy to configure.

I want to ask: presumably, since under the hood this implementation uses the table APIs, the exported column names will be the ServiceNow DB field names, rather than the field labels. Is that correct? If so, would you be able to suggest a mechanism for renaming the columns, as in the target DB we would ideally like to see ServiceNow field labels.

Thank you.

Jamie

Giles Lewis · ‎03-25-2022

Hi Jamie.

You will need to create database views.

You will actually need database views for two reasons:

You can use views to give more user friendly names to the columns.
More importantly, you will need to create views to resolve reference fields. For example, if you are reporting on the Incident table and you want to show the name of the Caller or Assigned To, then you will need to join (possibly multiple times) to sys_user because SNDML will only load the sys_id.

I recommend using two separate schemas in your database:

One schema for the raw tables which are created and managed by DataPump / SNDML.
A separate schema for end user reporting. The reporting schema does not contain any data. It just contains database views of the data in the first schema.

An OOB table like "incident" will contain a large number of fields which may be empty or irrelevant in a particular context. When you create database views, you will only include those fields which contain meaningful data that is likely to be useful in the given context.

Regards,

Giles

Jamsta1912 · ‎03-28-2022

Thank you Giles. I appreciate this advice.

Regards

Jamie

Michele30 · ‎08-30-2023

Hello Giles,

I really appreciate your work and we are using DATAPUMP app successfully in our environment.

We have only an issue, when the job fails due to a record that is not possible to export (for any reason), is there a way to skip that record automatically and continue with the job exporting the other records?

Many thanks for your help!

Giles Lewis · ‎02-16-2024

Hi Michelle. It is not easy to automatically recover from a record failure. Usually when there is a record failure, the failure occurs within the JSON parser. SNDML uses the Jackson JSON parser. The code will be attempting to parse a block (i.e. page) of records. Because the JSON contains multiple records when the parse fails, there is no way to know the sys_id of the record that caused the failure. I could modify the Java code to abandon the entire block, but that might be less than helpful. Generally the failure is caused by a corrupted record in the ServiceNow database. If you know the sys_id and you attempt to view the record from a ServiceNow form, it will generally cause your browser to freeze. If you know the sys_id, then the best solution is usually to write a script to delete the record (since it cannot be viewed or deleted from the GUI). But it is hard to find the sys_id.

We have some information because records are processed in sys_id order. We know that the sys_id which failed is greater than the last sys_id in the previous successful block. But it is not necessarily the next record. The parsing error could have occurred 50 records into the block. In theory you could find the problem record by using a filter of "sys_id>last_rec" where last_rec is the last record that successfully loaded and then load records using a page size of 1 (which will be very slow). In this case (since the page size is 1) the bad sys_id will be the next one that matches the filter after the last successful record, so you could write a script to find it.

Hellfried · ‎08-27-2024

Dear Giles,

How can we configure a query timeout for MS SQL Server?

Importing large tables will time out, unfortunately. ..

Michele30 · ‎01-15-2025

Hello Giles,

very good work. We are using the DataPump app for 3 years now and we are super happy that!

Unfortunately after the Xanadu upgrade the version installed starting failing very often and we don't know why.

We upgraded to 3.5 version but still the issue persist.

This is the command we run to start the import:

java -Dlog4j2.configurationFile=log4j2-daemon.xml -Dsndml.logFolder=Logs -Dsndml.logPrefix=main -jar sndml-3.5.0.09-mssql.jar -p prod_profile.txt --scan

I tried also to switch to --daemon but the issue is still there.

The profile is the following:

app.instance=xxxxxxx
app.username=datapump.daemon
app.password=xxxxxxx
app.agent=main
reader.instance=xxxxxxx
reader.username=datapump.daemon
reader.password=xxxxxxx
database.url=jdbc:sqlserver://xxxxxxx:1113;databaseName=xxxxxxx;trustServerCertificate=true;integratedSecurity=true;
database.username=admin
database.schema=dbo
database.autocreate=true
database.warn_on_truncate=true
daemon.interval=60
daemon.continue=false
server.threads=4
server.shutdown_seconds=30

The below error is triggered after the job started and some data have been already processed:

01-15 14:29:12 INFO RestTableReader [DPRUN0078073] INIT cmdb_rel_ci: Starting (8780 rows)
01-15 14:29:14 INFO RestTableReader [DPRUN0078073] PROCESS cmdb_rel_ci: Processed 2000 / 8780
01-15 14:29:18 INFO RestTableReader [DPRUN0078073] PROCESS cmdb_rel_ci: Processed 4000 / 8780
01-15 14:29:18 ERROR ScannerJobRunner [DPRUN0078073] ERROR cmdb_rel_ci: sndml.agent.ScannerJobRunner.call: com.fasterxml.jackson.core.JsonParseException
com.fasterxml.jackson.core.JsonParseException: Unexpected character ('<' (code 60)): expected a valid value (JSON String, Number, Array, Object or token 'null', 'true' or 'false')
at [Source: REDACTED (`StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION` disabled); line: 1, column: 2]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2481) ~[sndml-3.5.0.09-mssql.jar:?]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:752) ~[sndml-3.5.0.09-mssql.jar:?]
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:676) ~[sndml-3.5.0.09-mssql.jar:?]
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:2088) ~[sndml-3.5.0.09-mssql.jar:?]
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:812) ~[sndml-3.5.0.09-mssql.jar:?]
at com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4928) ~[sndml-3.5.0.09-mssql.jar:?]
at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:3278) ~[sndml-3.5.0.09-mssql.jar:?]
at sndml.servicenow.JsonRequest.execute(JsonRequest.java:64) ~[sndml-3.5.0.09-mssql.jar:?]
at sndml.servicenow.RestTableAPI.getRecords(RestTableAPI.java:118) ~[sndml-3.5.0.09-mssql.jar:?]
at sndml.servicenow.RestTableReader.call(RestTableReader.java:104) ~[sndml-3.5.0.09-mssql.jar:?]
at sndml.loader.JobRunner.runLoad(JobRunner.java:279) ~[sndml-3.5.0.09-mssql.jar:?]
at sndml.loader.JobRunner.call(JobRunner.java:122) ~[sndml-3.5.0.09-mssql.jar:?]
at sndml.agent.AppJobRunner.call(AppJobRunner.java:106) [sndml-3.5.0.09-mssql.jar:?]
at sndml.agent.ScannerJobRunner.call(ScannerJobRunner.java:33) [sndml-3.5.0.09-mssql.jar:?]
at sndml.agent.ScannerJobRunner.call(ScannerJobRunner.java:13) [sndml-3.5.0.09-mssql.jar:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:833) [?:?]
01-15 14:29:18 ERROR ScannerJobRunner [DPRUN0078073] ERROR cmdb_rel_ci: Critical error detected. Halting JVM.

Can you please help us to understand what the issue can be? 🙂

Thanks a lot!

Giles Lewis · ‎01-18-2025

@Michele30 This is a JSON parse exception. It occurs when the REST table API returns a block of records as a JSON message, but the message cannot be parsed because it is not valid JSON. In all the cases that I have seen, the error indicates a corrupted record (or multiple records) in ServiceNow. It will cause SNDML to fail consistently: always on the same table whenever there is an attempt to read the corrupted record. I would not expect the SNDML version to make a difference. There are a huge number of changes in SNDML 3.5, but the JSON parser has not changed since switching to the Jackson parser in 3.4.0 (March 2021). It is interesting that it happened following your Xanadu upgrade. I am wondering if actually happened DURING your upgrade. I would suggest splitting the job into three pieces:

Records created prior to the start of the Xanadu upgrade
Records created during the Xanadu upgrade
Records created after the Xanadu upgrade was completed

Can I ask you to open a new issue on the GitHub page (https://github.com/gflewis/sndml3/issues) for further tracking of this issue. Thanks.

Giles Lewis · ‎01-18-2025

@Hellfried Sorry for the delayed response.

Timeouts will occur if

The Page Size is too large
You are running too many jobs at the same time
You are using too many threads in a partitioned load

The easiest way to cause a timeout is to use partitioned load of a large table with large number of threads. My recommendation is that the total number of concurrent jobs + threads should typically not exceed 4.

Please open an issue on the GitHub page (https://github.com/gflewis/sndml3/issues) if you are still having issues. Thanks.

Giles Lewis · ‎03-24-2025

This article has been replaced with a new article on github.io.

Please refer to:

Getting Started with ServiceNow DataPump