Copying Service-Now data on-site for reporting purposes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎06-12-2015 08:27 AM
During my work to achieve an on-site copy of ServiceNow data (normally held in the Cloud), I have learnt many things. This article outlines some points for consideration in achieving this goal.
There seems to be a lack of discussion about this topic, so I thought it might be good to share my experiences. I hope you find the ideas in this article useful and welcome any additional comments if you've also been faced with similar challenges.
Firstly, keeping a copy of your ServiceNow data within your company's data-centre is not only possible and practical, it is probably quite wise. I won't delve into the reasons here because this is primarily a technical article, but the wealth of Business Intelligence and Reporting solutions available to most companies, and the desire to integrate you ServiceNow data with other tools or in-house solutions is going to depend on effective access to your service management data.
My target database is a SQL Server instance hosted on-site for reporting via SSRS, Business Objects, Crystal Reports or any of the myriads of other reporting tools that are capable of advanced reporting from SQL Server based data. I started on SQL Server 2008 and am now working with SQL Server 2012.
1 Choices
The initial solution I used was the ServiceNow ODBC driver. We essentially out-grew this in terms of data volumes and have recently purchased SnowMirror.
1.1 ServiceNow ODBC Driver
The ServiceNow ODBC driver originates from a third party (DataDirect), but has been adapted by ServiceNow to use a variation on MySQL syntax to retrieve data from the ServiceNow Database. I haven't found any maintained documentation of the SQL syntax, so my advice is to start with MySQL flavour SQL and "try it and see" approach.
Don't expect to be able to maintain very current (within hours) data in your on-site copy using ODBC — we have managed to get some deltas running as frequently as 3 hourly, but it's a struggle to support this when things go wrong. The large UK company I work with has nearly 3 million rows of data on the task table after running ServiceNow for 4 years. It takes more than a day to copy that much data.
1.1.1 SQL Views
In SQL Server, you can write a View definition that uses SQL like that below to expose the data directly from the ServiceNow instance. To connect the ODBC driver to SQL Server you need to define a Linked Server. My linked server is called "ServiceNow", so references in the SQL to "ServiceNow" will connect through that Linked Server definition, then through the ODBC driver, to the ServiceNow Instance defined in the Driver.
SELECT *
FROM OpenQuery (
SERVICENOW,
'SELECT company,
dv_cost_center,
dept_head,
description,
id,
name,
parent,
primary_contact,
dv_u_division,
sys_id,
sys_created_by,
CAST (sys_created_on AS TIMESTAMP) AS sys_created_on,
sys_updated_by,
CAST (sys_updated_on AS TIMESTAMP) AS sys_updated_on,
CAST (sys_mod_count AS FLOAT) AS sys_mod_count
FROM cmn_department')
The OpenQuery function in SQL Server simply tells SQL Server to pass the query directly to the target (in this case ServiceNow) without trying to interpret it.
Note that some fields need to be Cast for the data to make sense when it is shown. This usually precludes use of SELECT * FROM table as a generic solution.
1.1.2 Performance
Here is the first hurdle. I've shown you a query that is probably one of the smaller tables you'll want to use in reports, and it will has taken 7 seconds to retrieve 57 rows. This is made up of 3 factors:
• ServiceNow processing overhead in translating the query into GlideRecord logic
• Bandwidth and latency of the pipe between your SQL Server and the ServiceNow Cloud instance
• Time taken for ServiceNow to retrieve the records from its server.
The 7 seconds for 57 rows on this table translates into many hours for something like the task or incident tables, assuming you can get the connection to remain stable long enough to complete the query.
1.1.3 Deltas
Our solution was to use SQL Server Stored Procedures for copying the data, and in some cases Delta's to keep the data up-to-date.
Delta's will only Add and Update rows on the copy; they do not provide a solution for full maintenance (i.e. including Deletion of rows), so occasional full copies (or rebuilds), are necessary.
You also need to consider changes in table structures, new fields, and some other factors which I'll now outline.
1.1.4 New Fields
With the ODBC method, new fields will only get included if you change the query. You'll need to integrate such activity into your change control procedures, or during testing of ServiceNow patch upgrades.
1.1.5 The 6 connection limit
The ServiceNow instance limits the number of concurrent threads that can be connected at any time. At the time of writing that limit is 6 threads. If you have the kind of IT shop where lots of developers try to independently write solutions to pull data from the ServiceNow instance using ODBC, you may reach a point where your attempts to operate a single, central repository connecting via this mechanism will fail to find any available connections. It is therefore preferable to take a company wide strategy of pulling data on-site into one single place and allowing developers and reporters to connect to that copy, than allowing multiple copies to arise.
1.1.6 Table Sizes
Table size can be measured in Width and Length; where Width is the maximum number of bytes a row can consume, and the Length is the number of rows on the table. Some ServiceNow tables are very wide. We found copying TASK and INCIDENT tables challenging, and had to develop a strategy of splitting the columns into multiple queries, resulting in multiple tables in the SQL Server copy, which can either be re-combined using Views, or re-constituted into a table that looks similar to that on ServiceNow, using Stored Procedures or SSIS.
1.1.7 Enable GlideAggregate plugin
We initially struggled to get delta queries to perform any faster than full table queries (where a delta query looks for sys_updated_on within the last x hours or days). This is because of the way ServiceNow interprets the query. It still had to internally copy the entire table before it could apply the date based WHERE clause. A solution to this was turning on the Aggregate Services plugin, which enables GlideAggregate queries to be used within the ServiceNow side of the processing. You don't need to change your SQL to make this happen, but trust the ServiceNow ODBC interpreter to work it out once the plugin is available.
1.1.8 What you see is not what you get
What you see online through the ServiceNow console is not what you see when you query via ODBC. You are essentially copying raw tables, and need a good understanding of the schema and table relationships. This is well documented via the Tables and Columns module in the ServiceNow console, and use of the Schema Map diagrams it can produce.
1.1.9 Reference fields
Reference fields will contain the sys-id of the target, which exists on another table. So you either need to copy both the main table and all the tables it references, or use "DV" columns like that shown in the example SQL Statement for the cost-centre fields. Prefixing a column name with DV_ tells the ServiceNow interpreter to resolve the reference and send you the descriptive field set as the display field on the referenced table. Other ways to resolve this on the on-site database include LEFT JOINs to the reference table, or sub-queries to the reference table to expose the field values.
1.1.10 Journal fields
Journal fields are another challenge. Journal fields usually work by only provide a method of data entry on the form in ServiceNow. Once entered, the text is actually copied to the sys_journal_field table. To make a journal field (such as work-notes) available on your on-site database, you need to copy sys_journal_field table (which can get quite large), and either join the data using the element-id matching the sys_id, or come up with a method of re-assembling all the related journal field rows into a single text field that can be included. The following SQL Server View definition achieves this:
CREATE VIEW dbo.sys_journal_list_vw
AS
SELECT q1.name [table_name],
q1.element [field_name],
q1.element_id [row_sys_id],
Left (q1.value, Len (q1.value) - 2) AS "Complete_Journal"
FROM (SELECT DISTINCT
sq1.element,
sq1.name,
sq1.element_id,
(SELECT '('
+ CONVERT (NVARCHAR (20), sq2.sys_created_on, 113),
+'): ' + sq2.value + CHAR (13) AS [text()]
FROM dbo.SDMsys_journal_field_Tbl sq2
WHERE sq2.[element_id] = sq1.[element_id]
ORDER BY sq2.[element_id], sq2.sys_created_on DESC
FOR XML PATH ( '' ))
AS [value]
FROM dbo.sys_journal_field_Tbl sq1) q1
GO
1.1.11 Glide Lists (SlushBuckets)
XML Path is a useful SQL Server function for flattening out references to multiple rows, and can also be used in dealing with the next challenge: Glide List fields.
Glide List (Slushbucket) fields are string fields that contain a comma separated list of sys_id's like a reference field. If a glide list field contains multiple references, showing the list of sys_id's on a report is probably not very readable. The following SQL helps present Glide List fields:
SELECT field-list…,
ci.glide_list_field,
STUFF ( (SELECT ', ' + slush.name AS [text()]
FROM dbo.referenced_Tbl slush
WHERE ci.glide_list_field LIKE '%' + slush.sys_id + '%'
ORDER BY slush.name
FOR XML PATH ( '' )),
1,
2,
'')
dv_glide_list_field,
…field-list…
1.1.12 Availability of your on-site copy
Availability of a reporting solution can be important. Losing a table for minutes or hours (even overnight) to re-populate the data is simply not an option.
A useful technique we developed for ODBC populated tables is casually referred to as "The old switch-a-roo": When extracting tables you start with an empty table that gradually fills up. We are only interested in the final product, assuming the extract completes successfully. With "The old switch-a-roo", we extract the ServiceNow data into a new table (named differently) based on the structure of the existing fully-populated table. When the Stored Procedure has completed the extract, it deletes the original table, then renames the new table into its place. A rename takes milliseconds, and the old data can continue to be queried right up to the point when the rename occurs. If the extract fails to complete, the rename isn't made and the old data remains available until technical support rectify the issue.
1.1.13 Stability
Stability of the ODBC driver can also be an issue. We have had several occasions when the ODBC driver fails to work, or even causes the SQL Server instance to fail. I won't comment further on the standard of support that ServiceNow offer for the ODBC driver, however we've been looking for an alternative for some time.
1.2 SnowMirror
SnowMirror is a tool that is maturing into a really useful option. Instead of using the ODBC driver SnowMirror presents SOAP queries to your ServiceNow instance. A maximum of 3 SnowMirror connections are permitted (as standard), and whilst this is tight during Clean and Sync operations when it completely clears out and downloads fresh data from the SnowMirror instance, the normal delta synchronisation methods it uses are significantly faster, easier and more stable than any in-house ODBC solution we've been able to come up with.
Whereas it might take a developer several hours to write, test and implement components (views, stored procedures etc) for an ODBC based extract, it generally takes seconds for someone with lower levels of technical skill in SnowMirror.
You simply tell SnowMirror what table to copy, and what you want included, and it will initially copy, then maintain the data in that table. It addresses DV fields by giving you a choice of including sys_id's, values or both. You have a choice of synchronisation fields, but generally this is based on the table sys_id, which is unique with every ServiceNow table.
Deletions are handled using one of three methods: Audit, Truncate or Difference. The fastest is Audit, which uses the ServiceNow sys_audit_delete table to find out what's been deleted, and simply replicates that on the copy. Truncation would result in the table being re-built from scratch again, and would probably be the slowest option, particularly for large tables. Difference is a good compromise, where SnowMirror compares a list of keys between the on-site table and the ServiceNow table, and deletes those it finds are no longer present on ServiceNow.
You still have to deal with the issues of Glide-List fields and Journals, but the methods I described earlier can be used.
SnowMirror Enterprise edition also gives you a choice of database formats to use, so you are not tied in to SQL Server if you have an inherent dislike of Microsoft.
- Labels:
-
Analytics and Reports
-
Integrations
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎06-12-2015 08:32 AM
Great work. Thanks for the contribution, Paul.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎06-15-2015 04:35 AM
Great. Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎06-15-2015 05:34 AM
Thanks for the review, Paul.
Just to let you know the GlideList (Slushbucket) and Journal features are on our development list. Does anybody else need this? The more people vote for features the sooner they are released.
Regards,
Pavel Müller
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎12-22-2015 10:20 AM
The challenge with this approach is that you're still limited by SOAP....semaphores, row limits, memory, performance. This is exactly why there is new functionality in Geneva for data exporting - too many scaling challenges using SOAP. The only way to increase scalability while limiting performance impact to the system is to sit within ServiceNow itself. In this way, you can take advantage of the native functionality where you have intrinsic data relationship understanding ability (i.e. table hierarchy, journal fields, attachments, reference fields...). Using SOAP, you rely on batch updates querying entire tables with the only delta checking ability being the "updated" time.
The other inefficiency is that if you have other integration endpoints that need the same data, now you're going to have to poll the table for each integration separately. This can quickly result in severe performance degradation or a full blown outage depending on SQL queries generated combined with the user load and size of the underlying tables. A truly enterprise solution will avoid this scenario by replicating data only once and allowing for multiple subscribers.
A better approach needs to incorporate the integration needs as a whole, not simply a tool for one integration endpoint. By pairing a native ServiceNow application with a transparent ESB service this is how Perspectium Replicator is making an evolutionary advancement in enterprise integration. Learn more here: www.perspectium.com