Preparing for Instance Data Replication
Summarize
Summary of Preparing for Instance Data Replication
Instance Data Replication (IDR) enables the synchronization of data between producer and consumer instances. Proper preparation is essential to ensure successful replication and minimize latency issues.
Show less
Key Features
- Geographic Considerations: Reduce latency by keeping both instances in the same geographic region.
- Table Selection: Avoid replicating system tables to prevent data integrity issues and replication failures.
- Hierarchical Relationships: Maintain parent-child relationships by replicating all related tables to ensure data consistency.
- Replication Sets: Organize replication sets based on either individual tables or groups of related tables, with a recommendation to limit groups to five tables for easier management.
- Column Selection: Choose only necessary columns to replicate, avoiding sys columns that may lead to unnecessary data replication.
- Consumer Preparation: Ensure the target tables in the consumer instance are empty before starting replication.
Key Outcomes
- Improved data integrity and performance by following best practices for table and column selection.
- Clear organization of replication sets enhances management and monitoring of replication processes.
- Reduced risk of errors related to business rules, ACLs, and data policies by reviewing and adjusting them prior to replication.
- Effective monitoring of replication status and error tracking through the Instance Data Replication > Replication Payload Error table.
Before you set up Instance Data Replication (IDR), analyze the tables and columns in the producer and consumer instances to replicate data successfully.
Preparing your instances
Reduce latency by ensuring that the producer and consumer instances are in the same geographic region and belong to the same customer. You can replicate data between instances across different geographic regions. However, replication latency can occur when instances are located in data centers in separate regions.
Deciding which tables to replicate
Before you create a producer replication set, decide which tables to replicate. Avoid replicating system tables (tables with a sys_ prefix). If you replicate system tables (like sys_user, sys_user_group, or sys_user_grmember) that have existing data in the consumer instance, insert and update failures can occur during replication. If you decide to replicate these tables, you may have additional work afterwards to clean these tables.
Avoid continuous replication of CMDB tables. Replicating CMDB data as changes occur can create performance issues or unforeseen consequences with replication due to the number of records involved. If you must replicate CMDB tables, consider scheduling replication or use conditions to constrain the count of replicated records and ensure all required columns are included in the replication set.
For a list of tables to avoid replicating, see Excluded tables in Instance Data Replication.
Analyzing table hierarchies
See Preserving table hierarchy in Instance Data Replication.
Analyzing table relationships
For each table you want to replicate, determine if the table has reference fields that point to other tables. If you are replicating a table with a reference field that points to another table, but you don’t include that table in your replication set, the reference field will be empty in the record on the consumer instance. Replicating table relationships maintains data integrity and ensures that reference fields are populated on the consumer as you'd expect.
Organizing replication sets
As part of the planning phase, determine how you want to organize your replication sets and the tables they contain.
- Create a single replication set for each table. This option requires more setup, configuration, and time management, but allows you to pause replication for a single table without impacting other tables.
- Replication is performed in parallel using multiple jobs.
- Throughput metrics are easier to track with separate replication sets.
- When an error occurs, replication for a single table is affected instead of all multiple tables.
- Create a replication set for a group of related tables. This option is easier to set up and manage, but oversight and potential issues impact all tables in the group.
- If you need find information about replication status, the logical name you use for this replication set might be easier to find and understand compared to using distinct replication sets for each table.
- If you want to pause replication, you can do so for the entire group of tables by pausing replication from one replication set.
- If a replication issue occurs on one table and replication stops, this stops replication for all tables in the replication set.
If you decide to group tables in a single replication set, try to limit it to five or fewer tables. If you have more than five tables to replicate, use multiple replication sets with five or fewer tables in each set.
Deciding which columns to replicate
For each table you want to replicate, decide which columns to include. Avoid including every column in your replication set by default. Instead, decide which columns are necessary, and exclude any sys_ columns or other columns that are updated by scripts automatically.
For example, if you include a column updated by the system automatically on a frequent basis, IDR may replicate data more often than necessary. When the system replicates this data, it can negatively impact the capacity subscription. Periodically check the IDR License and Usage Details dashboard to monitor the message count that is generated from the producer.
Preparing target tables on the consumer instance
By default, IDR uses a record's sys_id field as the lookup value to keep data synchronized between the producer and consumer instance. If the target table contains existing data or records from prior data imports, the sys_id values in the consumer don't match the sys_ids from the producer instance.
Always consider the producer instance as the source of truth. For optimal replication results, follow these guidelines:
- Make the producer instance the exclusive data source for the consumer instance.
- Prior to replication, ensure that the target table in the consumer instance is empty. Ideally, the initial records in the target instance are created from data sent exclusively from the producer via a replication set or a clone.
- Ensure that there are no other ongoing data imports or updates to target tables on the consumer instance after replication starts.
For example, if users from LDAP are imported to the consumer instance using a different data source, the sys_ids for those records don't match the sys_id values in the producer instance. In this scenario, existing user records in the target table on the consumer aren't updated and duplicate records are created.
In situations where this cannot be avoided, clean the tables on the consumer instance prior to replication. You must delete the records in the target table or ensure the sys_id values in producer and consumer are the same.
Alternatively, you can use a custom Coalesce field to identify unique records (instead of the default sys_id column) for replication. Use a custom Coalesce column when records on the consumer instance have a different sys_id for the same records on the producer instance. For more information on using a custom Coalesce, see Custom coalescing.
Reviewing business rules
You can use business rules to trigger workflows after replication, such as sending a notification or validating the replicated data. Review any business rules on the consumer instance that might prevent the idr.system user from inserting, updating, or deleting records in the target table.
If failures related to business rules occur, they appear in the table on the consumer instance. View the Error Message field for details on the script that is causing the failure.
With bidirectional replication, records created on the producer instance are replicated to a consumer instance and vice versa. When the record is inserted on the consumer instance and it triggers a business rule that updates the record, that update is not replicated back to the producer instance.
Reviewing ACLs
Review the ACLs in place on the target table in the consumer instance to ensure that IDR can replicate data successfully. Confirm that the idr.system user has the appropriate roles for the target table.
If failures related to ACLs occur, they appear in the table on the consumer instance. View the Error Message field for details on the ACL that is causing the failure.
Reviewing data policies
Review any data policies in place on the target table in the consumer instance to ensure that IDR can replicate data successfully. Confirm that the data you are replicating satisfies the data policy.
If failures related to data policies occur, they appear in the table on the consumer instance. View the Error Message field for details on the data policy that is causing the failure.