We are receiving duplicate values while retrieving records by pagination using APIs

claude_elements · ‎03-15-2022

Hi ServiceNow team,

We are having an API integration in place and we are leveraging the same to retrieve records from ServiceNow tables such as - sn_install_base_item to retrieve the item records. However, it was observed that if we are retrieving these records while creating new records in parallel, then we tend to receive duplicate sys_id(s) in the response.

Its also observed that a single page size of the response does not have duplicate but some same sys_id(s) are found in the subsequent pages. For example, if Page 1 (200 records) has a sys_id like "abcd", then the same sys_id would be seen in a later page like page 5 or 6 but not in page 1 itself

Due to this, we have observed a loss of data during retrieval because the duplicates are replacing the original valid sys_ids which is a major concern for us

Please confirm if this is existing known behaviour of servicenow and is being fixed, or a bug?
We found the below articles which talks about certain workarounds on this concept. We tried all the options mentioned in this, but still receiving duplicates
https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0727636
https://community.servicenow.com/community?id=community_question&sys_id=bd7f8725dbdcdbc01dcaf3231f961949
https://support.servicenow.com/kb?id=kb_article_view&sysparm_article=KB0622463

Request URL: https://dev111679.service-now.com/api/now/v1/table/sn_install_base_item?sysparm_query=sys_updated_on>=1950/01/01 00:00:00^ORDERBYsys_id&sysparm_limit=200&sysparm_offset=0

We increment the sysparam_offset by 200 in every subsequent request

Giles Lewis · ‎03-18-2022

This is known behavior of the REST API. I am not aware of any plans to "fix" it. It may be a byproduct of the stateless nature of the API. ORDERBYsys_id helps, but you can still have issues if there are inserts and/or deletes happening in the background while your queries are running.

A workaround is to NOT use sysparm_offset. Instead, order your results by sys_id (which you are already doing) and add an additional clause to your sysparm_query as follows:

sys_id>{last_sys_id}

where {last_sys_id} is the maximum sys_id value from the previous query.

Another option is to note the start time of your process, and exclude any records inserted after the start of the process

sys_created_on<{query_start_time}

This solves the problem caused by inserts, but you will still have issues if there are deletes happening. Therefore I think the first workaround is probably better.

claude_elements · ‎03-21-2022

@Giles Lewis Since sys_id is an alphanumeric value that is basically a string, will the string comparison be possible with `>` operator ? If yes, what's the comparison criteria for comparing two sys_ids?