- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2016 11:01 AM
I am hitting different tables via the Table API for data extraction from one of our ServiceNow instances. This is a pretty large instance and contains many domains / view domains / etc. Not sure what is happening internal to ServiceNow, but my results seem inconsistent.
1) When paginating data using sysparm_offset and sysparm_limit, duplicates are returned in the response stream.
2) On one particular table (cmdb_ci_vm_instance), pagination only returns 1499 results, whereas bulk pulling all records results in 2000-400 records before a timeout occurs. (Table has over 5000 records in it, for the same company.)
My queries are pretty straight forward.. Here is an example for the Company table providng duplicates:
{0}/api/now/table/core_company?sysparm_limit=750&sysparm_offset=0&sysparm_exclude_reference_link=true
{0}/api/now/table/core_company?sysparm_limit=750&sysparm_offset=750&sysparm_exclude_reference_link=true
{0}/api/now/table/core_company?sysparm_limit=750&sysparm_offset=1500&sysparm_exclude_reference_link=true
I get two cases where a sys_id is duplicated twice in the data stream. I can verify the duplicates are actually in the data stream by viewing the raw resposne before I convert the JSON within my application and it's repeatable. Different batch sizes produce different duplication results.
The cmdb_ci_vm_instance table is especially frustrating. I have tried the same approach as above for paginating results, and included an orderby hoping that would solve my problem (it did not):
{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=0&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id
{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=750&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id
When using the above, I get 1499 records returned consistently. These records are a subset of the over 5000 records I am after in the table, (Same company sys_id but not a full set returned.) If I remove the limit/offset, the query times out but still gives me a partial data set of 2000-4000 records (inconsistent, but due to the timeout occuring.)
What gives? What else can I look at to resolve this? Am i doing something fundamentally wrong or is this a potential bug due to our environment specific settings?
Solved! Go to Solution.
- Labels:
-
Integrations
-
Scripting and Coding
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2016 01:09 PM
I am almost certain that this is related to my access, and how ServiceNow internally filters out data which is not accessible via the View Domains. We have many many customers, and not all data is treated equal. My account has view domains associated with it, and I believe what is happening is that the backend database is queried for data, and then some mid range process is filtering out the raw data that is 'not allowed' due to permissions. This results in interesting things like this:
{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=0&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id
2016-08-19 12:46:15,277 [INFO ] Returned Records Count: 750
{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=750&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id
2016-08-19 12:46:28,243 [INFO ] Returned Records Count: 746
I asked for 750, but the final response was missing 4 records. I increment my offset by 746 in this case, because that's how many records were returned (even though I asked for 750.) I then continue my processing until I get all records and wind up with *4* duplicates. I used to stop my processing when returned records was LESS than my sysparm_limit (as that should indicate the end of the data stream) but due to this bug it's not reliable.
The workaround is simple: Instead of breaking my pagination process when returned records is less than the sysparm_lmit value, I continue to process and ask for more records until the returned records == 0. I wind up with duplicates (in the above case, 4) which I can then filter out in memory before I use the data with a simple records.GroupBy(x => x.sys_id).Select(x => x.First()). I get all distinct values and the duplicates are removed, which is exactly what I am after.
This seems like a broken process (or at least not well documented on the Wiki) - but for now at least I have a work around which seems to resolve both of my issues above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-19-2016 01:09 PM
I am almost certain that this is related to my access, and how ServiceNow internally filters out data which is not accessible via the View Domains. We have many many customers, and not all data is treated equal. My account has view domains associated with it, and I believe what is happening is that the backend database is queried for data, and then some mid range process is filtering out the raw data that is 'not allowed' due to permissions. This results in interesting things like this:
{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=0&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id
2016-08-19 12:46:15,277 [INFO ] Returned Records Count: 750
{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=750&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id
2016-08-19 12:46:28,243 [INFO ] Returned Records Count: 746
I asked for 750, but the final response was missing 4 records. I increment my offset by 746 in this case, because that's how many records were returned (even though I asked for 750.) I then continue my processing until I get all records and wind up with *4* duplicates. I used to stop my processing when returned records was LESS than my sysparm_limit (as that should indicate the end of the data stream) but due to this bug it's not reliable.
The workaround is simple: Instead of breaking my pagination process when returned records is less than the sysparm_lmit value, I continue to process and ask for more records until the returned records == 0. I wind up with duplicates (in the above case, 4) which I can then filter out in memory before I use the data with a simple records.GroupBy(x => x.sys_id).Select(x => x.First()). I get all distinct values and the duplicates are removed, which is exactly what I am after.
This seems like a broken process (or at least not well documented on the Wiki) - but for now at least I have a work around which seems to resolve both of my issues above.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2019 10:40 AM
Having the same problem here. A simple test, query just the sys_id field for cmdb_ci, with sysparm_limit set to 1,000 and page through the results using sysparm_offset =0, 1000, 2000 etc. A HUGE number of duplicates are returned (in our tests, about 10% were dupes). Worse, with every query, a different number were returned.
This is a huge problem, as it makes the ServiceNow REST API non-deterministic.
HOWEVER, there is a fix:
You just need to add ORDERBYsys_id to the query. This seems to always work.
This is a big mistake on ServiceNow's part - their paging doesn't work because they don't apply an implicit orderby.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2019 11:27 AM
Hi David -
This is quite old but from what I recall, not only did I have to order the data (as you suggested) but also increment my offset by my batch request size. So: if you ask for 1000 records and only get 800 (because 200 were filtered out by SNOW), then you still increment your offset by 1000. Also, continue pulling records until you reach the total. The total number of records will be in the headers of the API response from SNOW.
After I did that, I got much better results (no duplicates, knew when I was at the end of the data set.)
Hope that helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-11-2019 04:47 PM
Thanks JD, that is indeed what we are doing, and it seems to be working 🙂