Table REST API returns duplicates / inconsistent data sets

j_d
Giga Contributor

I am hitting different tables via the Table API for data extraction from one of our ServiceNow instances.   This is a pretty large instance and contains many domains / view domains / etc.   Not sure what is happening internal to ServiceNow, but my results seem inconsistent.

1) When paginating data using sysparm_offset and sysparm_limit, duplicates are returned in the response stream.

2) On one particular table (cmdb_ci_vm_instance), pagination only returns 1499 results, whereas bulk pulling all records results in 2000-400 records before a timeout occurs.   (Table has over 5000 records in it, for the same company.)

My queries are pretty straight forward.. Here is an example for the Company table providng duplicates:

{0}/api/now/table/core_company?sysparm_limit=750&sysparm_offset=0&sysparm_exclude_reference_link=true

{0}/api/now/table/core_company?sysparm_limit=750&sysparm_offset=750&sysparm_exclude_reference_link=true

{0}/api/now/table/core_company?sysparm_limit=750&sysparm_offset=1500&sysparm_exclude_reference_link=true

I get two cases where a sys_id is duplicated twice in the data stream.   I can verify the duplicates are actually in the data stream by viewing the raw resposne before I convert the JSON within my application and it's repeatable.   Different batch sizes produce different duplication results.

The cmdb_ci_vm_instance table is especially frustrating.   I have tried the same approach as above for paginating results, and included an orderby hoping that would solve my problem (it did not):

{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=0&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id

{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=750&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id

When using the above, I get 1499 records returned consistently.   These records are a subset of the over 5000 records I am after in the table, (Same company sys_id but not a full set returned.)   If I remove the limit/offset, the query times out but still gives me a partial data set of 2000-4000 records (inconsistent, but due to the timeout occuring.)  

What gives?   What else can I look at to resolve this?   Am i doing something fundamentally wrong or is this a potential bug due to our environment specific settings?

1 ACCEPTED SOLUTION

j_d
Giga Contributor

I am almost certain that this is related to my access, and how ServiceNow internally filters out data which is not accessible via the View Domains.   We have many many customers, and not all data is treated equal.   My account has view domains associated with it, and I believe what is happening is that the backend database is queried for data, and then some mid range process is filtering out the raw data that is 'not allowed' due to permissions.   This results in interesting things like this:



{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=0&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id


2016-08-19 12:46:15,277 [INFO ] Returned Records Count: 750



{0}/api/now/table/cmdb_ci_vm_instance?sysparm_offset=750&sysparm_limit=750&sysparm_exclude_reference_link=true&sysparm_query=ORDERBYsys_id


2016-08-19 12:46:28,243 [INFO ] Returned Records Count: 746



I asked for 750, but the final response was missing 4 records.   I increment my offset by 746 in this case, because that's how many records were returned (even though I asked for 750.)   I then continue my processing until I get all records and wind up with *4* duplicates.   I used to stop my processing when returned records was LESS than my sysparm_limit (as that should indicate the end of the data stream) but due to this bug it's not reliable.



The workaround is simple:   Instead of breaking my pagination process when returned records is less than the sysparm_lmit value, I continue to process and ask for more records until the returned records == 0.   I wind up with duplicates (in the above case, 4) which I can then filter out in memory before I use the data with a simple records.GroupBy(x => x.sys_id).Select(x => x.First()).   I get all distinct values and the duplicates are removed, which is exactly what I am after.



This seems like a broken process (or at least not well documented on the Wiki) - but for now at least I have a work around which seems to resolve both of my issues above.


View solution in original post

6 REPLIES 6

Hello ,

 

I am also facing the same issue with servicenow Table API as it is returning me duplicate sys_id's

 

Here is our sample service now uri's with batch of 900 records:

/api/now/table/alm_asset?&sysparm_display_value=false&sysparm_limit=900&sysparm_query=sys_created_on%3E2019-04-30+18%3A18%3A30.000%5EORDERBYsys_id&&sysparm_offset=0"

Record Count :900

/api/now/table/alm_asset?&sysparm_display_value=false&sysparm_limit=900&sysparm_query=sys_created_on%3E2019-04-30+18%3A18%3A30.000%5EORDERBYsys_id&&sysparm_offset=900"

Record Count:900

 

/api/now/table/alm_asset?&sysparm_display_value=false&sysparm_limit=900&sysparm_query=sys_created_on%3E2019-04-30+18%3A18%3A30.000%5EORDERBYsys_id&&sysparm_offset=1800

 

Record Count:900

 

I am already appending  ORDERBYsys_id to the query .However i am still getting duplicate sys_id's.

 

Can anyone please let me know how I can fix this duplicate sys_id issue?

risanchez
Giga Expert

We were experiencing the same pagination issue with the SOAP API, so this is not isolated to the REST API in ServiceNow. The solution that worked for us was to force sorting by sys_created_on and then by sys_id on every page requested. The bug occurs less frequently when extracting data in larger chunks. The bug is difficult to detect because ServiceNow will return the correct number of rows, except that it will skip some rows and it will return the same rows multiple times to make up for the ones that it skipped, giving you the false sense that it extracted all the rows correctly if you simply did a row count check.