Large Data Exports (CSV/XML)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎02-28-2012 03:57 AM
Hi guys,
I am trying to export large amounts of data (> 500,000 rows) from service now, in particular, the metric_instance_list table as csv's. This data is to be pulled by a daily cron-job to populate another application which is reporting on these metrics.
It is my understanding that results from the export of either CSV or XML are ordered by sys_id. The problem with the sys_id field of course is that it does not take into consideration time as it is not a sequential field. For example, if I want to return the first 10 records by sys_id, i.e.
metric_instance_list.do?XML&sysparm_record_count=10, I will get metric_instances of varying sys_created_on order (non-chronological).
If one was to follow the example from http://wiki.service-now.com/index.php?title=Exporting_Data#Breaking_Up_Large_Exports, we would need to pull ALL the data from the entire metric instance table daily, as new records will likely come before many of the old ones due to the issue described above.
As there is a very large number of records in this table (550,000 plus), I would like to avoid pulling out all these records daily, as this puts a massive strain on production. Instead, I want to just export/query metric_instances that have not already been exported (i.e. order by sys_created_on and pull fresh records on the next Cron-run, those after the date of the last fetched row), preferably by CSV.
However, from following the API, if I try to run:
metric_instance_list.do?CSV&sysparm_order=sys_created_on&sysparm_record_count=10
both the sysparm_record_count and sysparm_order are ignored, and 50,000 records unordered are returned.
If I attempt to do the same using XML,
metric_instance_list.do?XML&sysparm_order=sys_created_on&sysparm_record_count=10
The record limit is implemented, but not the ordering on sys_created_on.
i.e.
true
02c1ba5d184bd04082310596993fa6c9
Sam Parker
74ebba6147bf980082313d3487093565
system
2012-01-27 15:56:16
00000d8a47b31c0082313d3487093581
0
system
2012-01-27 15:56:16
incident
Sam Parker
true
02c1ba5d184bd04082310596993fa6c9
Ravi Singh
65fefa938750e000055fe5b8c9434d66
system
2012-02-25 19:02:51
000147d3871c2040f6acef7b89434d47
0
system
2012-02-25 19:02:51
incident
Ravi Singh
true
02c1ba5d184bd04082310596993fa6c9
Mark Thomas
21ff73a44700200082313d3487093572
system
2012-02-05 00:23:46
0001c0784700200082313d3487093569
0
system
2012-02-05 00:23:46
incident
Mark Thomas
true
02c1ba5d184bd04082310596993fa6c9
Mary Summers
8d0180ff47a3940082313d3487093599
system
2012-01-20 11:38:31
0002843347e3940082313d3487093575
0
system
2012-01-20 11:38:31
incident
Mary Summers
true
02c1ba5d184bd04082310596993fa6c9
Dane Smith
763d43a818df584082310596993fa6ea
system
2011-12-30 17:05:26
0003c5c5189b984082310596993fa691
0
system
2011-12-30 17:05:26
incident
Dane Smith
true
02c1ba5d184bd04082310596993fa6c9
Vica Honarae
1d038c7187402000f6acef7b89434dad
system
2012-02-08 03:10:48
001404b187402000f6acef7b89434d20
0
system
2012-02-08 03:10:48
incident
Vica Honarae
true
02c1ba5d184bd04082310596993fa6c9
Robert Raines
82c3b3a58790a000055fe5b8c9434d09
system
2012-02-20 18:06:56
001449f98718e800f6acef7b89434d12
0
system
2012-02-20 18:06:56
incident
Robert Raines
true
02c1ba5d184bd04082310596993fa6c9
Jamie Bolton
11024c9c18a3508082310596993fa64f
system
2012-01-09 14:48:48
0014809418e3508082310596993fa607
0
system
2012-01-09 14:48:48
incident
Jamie Bolton
true
02c1ba5d184bd04082310596993fa6c9
Inzi Raju
b6c0b53987c02000055fe5b8c9434da1
system
2012-02-08 12:29:30
001482f987c02000055fe5b8c9434de7
0
system
2012-02-08 12:29:30
incident
Inzi Raju
true
02c1ba5d184bd04082310596993fa6c9
Mary Summers
f1bf688787cce000f6acef7b89434d1f
system
2012-02-12 05:08:01
00150103874c2000055fe5b8c9434ddc
0
system
2012-02-12 05:08:01
incident
Mary Summers
The results are not returned by sys_created_on order, and in fact give the exact same result as if you are not using the sysparm_order parameter at all. I've tried repeating this by using metric_instance_list.do?XML&sysparm_order=field_value&sysparm_record_count=10, this does not order by the field_value either and returns the same result set as above. I can only assume that this parameter is completely being ignored, that it does not work as described/inferred in the documentation, or that I am not calling the method correctly.
Has anybody had experience exporting large amounts of data using the CSV/XML export feature? Can they please explain how to use your xml/csv export feature so that parameters return results sorted by something other than the default sys_id field, and perhaps why these parameters are not consistent across different export types?
Thanks in advance!
Raj

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-24-2012 10:48 AM
Has anybody been able to help Raj?
I am also bumping into the 50k record limit.
Originally our team was going to use the more modern approach of a web service, but there is a hard limit of 250 records from that.
So, the second approach was to use a scheduled push of a csv file to a designated FTP location via Jacob Anderson's scheduled data extract.
See: http://www.servicenowguru.com/integration/exporting-report-ftp/
In that article he states:
Although it is possible to use this export functionality to export large numbers of records (500K+) from your Service-now instance, you obviously should not do so without doing some performance testing first. Any large query against any database has the potential to adversely affect the performance of that database.
However, I have been unable to discover how you can get past the 50k hard limit.
Theoretically you should be able to change the limit via one (or all) of the following mechanisms in System Property:
com.glide.processors.CSVProcessor.sysparm_record_count
com.glide.processors.CSVProcessor.max_record_count
com.glide.processors.CSVProcessor.record_count
(Intuited from: http://wiki.servicenow.com/index.php?title=Fall_2009_Notable_Changes)
I tried setting all three, in various combinations, to 100,000, but to no avail.
Has anyone else been able to get past this limit!?
Thanks,
Steven.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-24-2012 11:46 AM
Hi Steven,
I ended up getting around this by incrementally sucking out data, firstly by week, but then by day due to the sheer number of records. ServiceNow support have not been much help in the matter I'm afraid, so this seemed the only real option.
Regards
Raj

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-24-2012 12:05 PM
Raj:
Thanks. Yeah, we have ended up getting around this in the same method. Hard to believe we have to move to 1980's tech (.csv) to pull the data. We have put in a request for help via our sales rep, but still nothing from that direction as well. Also, no reason given why the throttelling is in place and not configurable. Just vague warnings about performance. I have run tests, and record sets of 100k records are pretty fast, and the data sizes (combined) can run between 5Mb and 10Mb, and download pretty fast via FTP. Pretty small for csv ftp transfers. So not sure the reasoning here. We have several SaaS products we are using. Service-Now and one other have this throttelling in place with no documented explanation, and no company suggested work-around.
Hopefully we will get an answer soon.
Steven.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎07-24-2012 04:12 PM
Raj:
Thanks, yes, our team ended up implementing pretty much the same thing.
According to Jacob Anderson the com.glide.processors.XMLProcessor.max_record_count property should have done the trick with his Scheduled Data Extract script. However, I tested it and it is still 50K no matter what the setting.
I'm working on a sys_keys solution with an external web service. Pulsing across 250 recs at a time. However, it appears that even with that you don't get all the sys_keys back either. So there appears to be a limit there as well (probably data allowed in a single XML tag).
Still digging on this, but for now the report solution seems to be the only one available.
Steven.