Table Rotation - Extension vs Shard
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2022 07:27 AM
Ahoy,
let me just jump straight into the topic, there is this sys_table_rotation in ServiceNow, with Type field, which offers you the following:
- Rotation (which rotates operational data into N tables, and upon last rotation, oldest data are truncated)
- Extension (which splits operational table into N+1 tables, creating more tables and keeping them forever)
- Shard (which... does what exactly?)
We have a table with large data set and we want to preserve it for ~10 years, meaning there can be around ~50M records. At the same time, we want to have fast running queries and list/form loads at least for operational/new data, so we would like to "shard" this table into multiple tables, exactly as Extension does, but not based on sys_created_on, but based on custom field (shard_by). However, when you pick Type=Shard, it OOB creates 16 Shard tables and sets shard_by field to sys_id (which does not help).
Does any of you guys ever worked with this, can provide any insight into Shard, provide docs, examples, anything?
Thank you!
- Labels:
-
Architect
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2022 07:55 AM
According to this doc, Shard is based on document id, where are extension and rotation are time based: https://docs.servicenow.com/bundle/tokyo-platform-administration/page/administer/platform-performanc...
It also links to this document further explaining the three types:
This article might also be helpful:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2022 08:42 AM
Yes, but what is the benefit of sharding by sys_id? If you lets say load 5M records into the table, and there are 16 shards distributed by shard_id=sys_id, does these 16 table get equally populated? What would be the benefit then? Thanks for the reply!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2022 08:46 AM
OK I just examined shard rotation on sys_activity table and seems like (in case of our instance) each shard (sys_activity0001, sys_activity0002, sys_activity0003...) has around 13k records, what supports my theory (that records are distributed equally).
But whats the benefit of it?
If you shard by time based field, then if you query e.g. records created on last month, they all belong to single shard hence the query is very fast. But whats benefit of sharding by sys_id?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2022 08:55 AM
Without knowing the details of how servicenow handles sharding in the actual database, my best guess is that sys_id based sharding allows for a more balanced sharding of data. You could have very high levels of data in a certain time frame, very low in another - so date based can run the risk of a partition having performance issues. By using sys_ids, they can run code (most likely hashing) to determine which shard that record should go into.