Large Data Exports (CSV/XML)

rajabhadury
Kilo Explorer

Hi guys,

I am trying to export large amounts of data (> 500,000 rows) from service now, in particular, the metric_instance_list table as csv's. This data is to be pulled by a daily cron-job to populate another application which is reporting on these metrics.

It is my understanding that results from the export of either CSV or XML are ordered by sys_id. The problem with the sys_id field of course is that it does not take into consideration time as it is not a sequential field. For example, if I want to return the first 10 records by sys_id, i.e.

metric_instance_list.do?XML&sysparm_record_count=10, I will get metric_instances of varying sys_created_on order (non-chronological).

If one was to follow the example from http://wiki.service-now.com/index.php?title=Exporting_Data#Breaking_Up_Large_Exports, we would need to pull ALL the data from the entire metric instance table daily, as new records will likely come before many of the old ones due to the issue described above.

As there is a very large number of records in this table (550,000 plus), I would like to avoid pulling out all these records daily, as this puts a massive strain on production. Instead, I want to just export/query metric_instances that have not already been exported (i.e. order by sys_created_on and pull fresh records on the next Cron-run, those after the date of the last fetched row), preferably by CSV.

However, from following the API, if I try to run:

metric_instance_list.do?CSV&sysparm_order=sys_created_on&sysparm_record_count=10
both the sysparm_record_count and sysparm_order are ignored, and 50,000 records unordered are returned.

If I attempt to do the same using XML,
metric_instance_list.do?XML&sysparm_order=sys_created_on&sysparm_record_count=10
The record limit is implemented, but not the ordering on sys_created_on.

i.e.



true
02c1ba5d184bd04082310596993fa6c9



Sam Parker
74ebba6147bf980082313d3487093565

system
2012-01-27 15:56:16
00000d8a47b31c0082313d3487093581
0
system
2012-01-27 15:56:16
incident
Sam Parker



true
02c1ba5d184bd04082310596993fa6c9



Ravi Singh
65fefa938750e000055fe5b8c9434d66

system
2012-02-25 19:02:51
000147d3871c2040f6acef7b89434d47
0
system
2012-02-25 19:02:51
incident
Ravi Singh



true
02c1ba5d184bd04082310596993fa6c9



Mark Thomas
21ff73a44700200082313d3487093572

system
2012-02-05 00:23:46
0001c0784700200082313d3487093569
0
system
2012-02-05 00:23:46
incident
Mark Thomas



true
02c1ba5d184bd04082310596993fa6c9



Mary Summers
8d0180ff47a3940082313d3487093599

system
2012-01-20 11:38:31
0002843347e3940082313d3487093575
0
system
2012-01-20 11:38:31
incident
Mary Summers



true
02c1ba5d184bd04082310596993fa6c9



Dane Smith
763d43a818df584082310596993fa6ea

system
2011-12-30 17:05:26
0003c5c5189b984082310596993fa691
0
system
2011-12-30 17:05:26
incident
Dane Smith



true
02c1ba5d184bd04082310596993fa6c9



Vica Honarae
1d038c7187402000f6acef7b89434dad

system
2012-02-08 03:10:48
001404b187402000f6acef7b89434d20
0
system
2012-02-08 03:10:48
incident
Vica Honarae



true
02c1ba5d184bd04082310596993fa6c9



Robert Raines
82c3b3a58790a000055fe5b8c9434d09

system
2012-02-20 18:06:56
001449f98718e800f6acef7b89434d12
0
system
2012-02-20 18:06:56
incident
Robert Raines



true
02c1ba5d184bd04082310596993fa6c9



Jamie Bolton
11024c9c18a3508082310596993fa64f

system
2012-01-09 14:48:48
0014809418e3508082310596993fa607
0
system
2012-01-09 14:48:48
incident
Jamie Bolton



true
02c1ba5d184bd04082310596993fa6c9



Inzi Raju
b6c0b53987c02000055fe5b8c9434da1

system
2012-02-08 12:29:30
001482f987c02000055fe5b8c9434de7
0
system
2012-02-08 12:29:30
incident
Inzi Raju



true
02c1ba5d184bd04082310596993fa6c9



Mary Summers
f1bf688787cce000f6acef7b89434d1f

system
2012-02-12 05:08:01
00150103874c2000055fe5b8c9434ddc
0
system
2012-02-12 05:08:01
incident
Mary Summers



The results are not returned by sys_created_on order, and in fact give the exact same result as if you are not using the sysparm_order parameter at all. I've tried repeating this by using metric_instance_list.do?XML&sysparm_order=field_value&sysparm_record_count=10, this does not order by the field_value either and returns the same result set as above. I can only assume that this parameter is completely being ignored, that it does not work as described/inferred in the documentation, or that I am not calling the method correctly.

Has anybody had experience exporting large amounts of data using the CSV/XML export feature? Can they please explain how to use your xml/csv export feature so that parameters return results sorted by something other than the default sys_id field, and perhaps why these parameters are not consistent across different export types?


Thanks in advance!
Raj

6 REPLIES 6

sabell2012
Mega Sage
Mega Sage

Has anybody been able to help Raj?

I am also bumping into the 50k record limit.

Originally our team was going to use the more modern approach of a web service, but there is a hard limit of 250 records from that.

So, the second approach was to use a scheduled push of a csv file to a designated FTP location via Jacob Anderson's scheduled data extract.

See: http://www.servicenowguru.com/integration/exporting-report-ftp/

In that article he states:
Although it is possible to use this export functionality to export large numbers of records (500K+) from your Service-now instance, you obviously should not do so without doing some performance testing first. Any large query against any database has the potential to adversely affect the performance of that database.

However, I have been unable to discover how you can get past the 50k hard limit.

Theoretically you should be able to change the limit via one (or all) of the following mechanisms in System Property:

com.glide.processors.CSVProcessor.sysparm_record_count
com.glide.processors.CSVProcessor.max_record_count
com.glide.processors.CSVProcessor.record_count

(Intuited from: http://wiki.servicenow.com/index.php?title=Fall_2009_Notable_Changes)

I tried setting all three, in various combinations, to 100,000, but to no avail.

Has anyone else been able to get past this limit!?

Thanks,

Steven.


Hi Steven,

I ended up getting around this by incrementally sucking out data, firstly by week, but then by day due to the sheer number of records. ServiceNow support have not been much help in the matter I'm afraid, so this seemed the only real option.

Regards
Raj


sabell2012
Mega Sage
Mega Sage

Raj:

Thanks. Yeah, we have ended up getting around this in the same method. Hard to believe we have to move to 1980's tech (.csv) to pull the data. We have put in a request for help via our sales rep, but still nothing from that direction as well. Also, no reason given why the throttelling is in place and not configurable. Just vague warnings about performance. I have run tests, and record sets of 100k records are pretty fast, and the data sizes (combined) can run between 5Mb and 10Mb, and download pretty fast via FTP. Pretty small for csv ftp transfers. So not sure the reasoning here. We have several SaaS products we are using. Service-Now and one other have this throttelling in place with no documented explanation, and no company suggested work-around.

Hopefully we will get an answer soon.

Steven.


sabell2012
Mega Sage
Mega Sage

Raj:

Thanks, yes, our team ended up implementing pretty much the same thing.

According to Jacob Anderson the com.glide.processors.XMLProcessor.max_record_count property should have done the trick with his Scheduled Data Extract script. However, I tested it and it is still 50K no matter what the setting.

I'm working on a sys_keys solution with an external web service. Pulsing across 250 recs at a time. However, it appears that even with that you don't get all the sys_keys back either. So there appears to be a limit there as well (probably data allowed in a single XML tag).

Still digging on this, but for now the report solution seems to be the only one available.

Steven.