Monitoring queues in Instance Data Replication

  • Release version: Xanadu
  • Updated August 1, 2024
  • 4 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Monitoring queues in Instance Data Replication

    The Instance Data Replication (IDR) Queue Dashboard enables ServiceNow administrators and users with theadminoridradminroles to monitor the status and performance of data replication queues. This dashboard provides insights into the flow and processing of replication records and messages between producer and consumer instances, facilitating effective management of replication health and troubleshooting potential issues.

    Show full answer Show less

    Accessing the IDR Queue Dashboard

    Users can access the dashboard by navigating to Instance Data Replication > Queue Dashboard. The dashboard displays key metrics related to replication queues over selectable time periods (Last 24 Hours or Last 5 Days), with hourly or daily granularity respectively.

    Key Features

    • IDR Queued Producer Records: Shows the number of records queued per table hourly or daily, helping identify tables with the highest replication traffic and detect activity spikes that may cause performance lags.
    • Outbound Messages Remaining: Displays messages in the replication queue yet to be sent to the message queue, indicating if the producer instance is catching up after activity spikes or if there are processing issues.
    • Outbound Messages Processed: Tracks messages sent from the producer instance to the message queue, highlighting trends in data flow and potential bottlenecks in message production.
    • Inbound Messages Remaining: Monitors messages in the message queue not yet processed by the consumer instance, signaling if the consumer is lagging or unable to keep up with incoming data.
    • Inbound Messages Processed: Shows messages consumed by the instance over time, allowing users to identify which replication sets have the most traffic and verify smooth data consumption.

    Practical Usage and Troubleshooting

    • Select specific tables or replication sets to isolate and analyze queue behavior for targeted troubleshooting or performance tuning.
    • Use cursor hover on charts to reveal detailed message counts and related replication sets, assisting in pinpointing issues or confirming normal operations.
    • Consistently growing counts in outbound or inbound messages remaining may indicate processing failures, such as inactive jobs (IDRProducerJob or IDRConsumerJob), connectivity issues with the message queue, or system resource contention.
    • Observation of spikes in queued records or messages helps anticipate and manage performance lags on consumer instances, often related to scheduled business rules or batch processes.

    Benefits for ServiceNow Customers

    By leveraging the IDR Queue Dashboard, customers can proactively monitor and manage the health of their data replication processes, quickly identify and resolve bottlenecks or errors, and ensure timely and reliable synchronization of data across instances. This visibility supports maintaining system performance and data consistency, especially in environments with high data change volumes or complex replication configurations.

    Monitor the replication record queue, message produced queue, message consumed queue, and the messages processed for all replications sets through the Instance Data Replication (IDR) Queue Dashboard.

    Accessing the IDR Queue Dashboard

    Users with the admin or idr_admin role can access the dashboard.

    Access the IDR Queue Dashboard by navigating to Instance Data Replication > Queue Dashboard.

    IDR Queue Dashboard

    The IDR Queue Dashboard enables you to monitor the following:

    • IDR Queued Producer Records which are hourly records queued for all tables over a 24-hour time period.
    • Outbound Messages Remaining which are messages remaining in the replication queue that are not yet sent to the message queue.
    • Outbound Messages Processed which are messages produced from this instance to the message queue.
    • Inbound Messages Remaining which are messages remaining in the message queue, that have not yet been processed.
    • Inbound Messages Processed which are messages consumed on this instance.

    In any chart, select Last 24 Hours or Last 5 Days as the time period. For the 24-hour period, the number of messages is per hour. For the 5-day period, the number of messages is per day.

    Select the legend link under the chart to exclude that data source.

    Figure 1. IDR Queue Dashboard
    Queue dashboard.

    IDR Queued Producer Records

    With the IDR Queued Producer Records chart, you can see the number of records queued for each table over an hourly or daily period. It shows which tables account for the highest amount of traffic within IDR over time. Use this chart to identify activity spikes that cause predictable performance lags on the consumer instance.

    For example, if you see that a large spike of activity occurs every day at 3:00 a.m. due to a business rule on a table, you should expect a performance lag to occur on the consumer around that time.

    Select All Tables or a specific table. Position your cursor over a point in the chart to see the queue count and tables for that point.

    Outbound Messages Remaining

    With the Outbound Messages Remaining chart, you can determine whether a producer instance is catching up to real-time replication after a large spike in activity.

    When replication is working correctly, the messages remaining count should be very low. If there is a spike in activity, you can expect a large value. For example, when a business rule is changing tens of thousands of records within a minute.

    You can also expect messages remaining to decrease over time as the jobs process the messages.

    If the messages remaining count continues to grow without resolution, it might indicate:
    • An issue processing the messages. For example, the IDRProducerJob is not running, or cannot send messages to the message queue.
    • The instance is recording changes faster than IDR can produce them.

    Select All Sets or a specific set. Position your cursor over a point in the chart to see the message count and replication set name for that point.

    Outbound Messages Processed

    With the Outbound Messages Processed chart, you can see the flow of records from a producer instance to the message queue over time.

    Trends for messages processed and the messages remaining over time indicates if replication is recovering from a lag or if there are issues sending data to the message queue.

    Along with the Data Replication Queued Producer records chart, you can see if the instance is sending data. If your instance has queued records that are not being sent, it might indicate:
    • The instance is not able to run the producer job. For example, due to other resource-intensive processes running on the instance and all worker threads being busy.
    • There is an issue connecting to the message queue.

    To see the message count and replication set name for a graph point, select All Sets or a specific set and position your cursor over a point in the chart.

    Inbound Messages Remaining

    With the Inbound Messages Remaining chart, you can determine whether a consumer instance is catching up to real-time replication after a large spike in activity.

    You can expect a temporary large value when there is a spike in activity. The value normally decreases as the messages are processed.

    If this value continues to grow without resolution, it might indicate:
    • An issue processing the messages. For example, the IDRConsumerJob is not running, or cannot read messages from the message queue.
    • The instance is recording changes faster than IDR can consume them.

    Select All Sets or a specific set. Position your cursor over a point in the chart to see the message count and replication set name for that point.

    Inbound Messages Processed

    With the Inbound Messages Processed chart, you can see the flow of records for each consumer set over time.

    Use the inbound messages chart to determine which replication sets have the most traffic and see trends for messages processed and the messages remaining.

    If the producer is sending records to the message queue and the consumer is not processing them, it might indicate issues with the producer or the consumer instance.

    Position your cursor over a point in the chart to see the message count and replication set name for that point.