RabbitMQ default checks and policies

  • Release version: Zurich
  • Updated July 31, 2025
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of RabbitMQ default checks and policies

    ServiceNow's Agent Client Collector offers a set of default checks and policies to monitor RabbitMQ health and performance, applicable only in Windows environments. Before running these checks, you must perform RabbitMQ discovery to ensure accurate monitoring.

    Show full answer Show less

    Default Event Checks

    The default event checks focus on verifying RabbitMQ server availability, cluster node status, consumer counts, message queues, network partitions, node health, node usage, queue drain times, queue synchronization, and STOMP responsiveness. Each check triggers alerts based on predefined or configurable thresholds, helping you proactively identify issues such as server downtime, unhealthy nodes, network splits, and performance bottlenecks.

    These checks use specific Ruby scripts with parameters for host, port, virtual host, thresholds, and other criteria, enabling tailored monitoring aligned with your RabbitMQ deployment.

    Default Metrics Collection

    The default metrics policies provide detailed statistics about RabbitMQ server overview and per-queue metrics. These metrics allow you to track queue performance and server resource usage over time, supporting capacity planning and troubleshooting.

    Metrics collection scripts accept parameters such as host, port, and optionally virtual host, making them adaptable to your environment.

    Practical Benefits for ServiceNow Customers

    • Comprehensive RabbitMQ health monitoring: Detects server and cluster issues early, avoiding service disruptions.
    • Configurable alerts and thresholds: Helps tailor monitoring sensitivity to your operational requirements.
    • Enhanced visibility into queue performance: Enables proactive management of message backlogs and drain times.
    • Metrics-driven insights: Supports data-driven decisions for scaling and troubleshooting RabbitMQ infrastructure.
    • Windows environment compatibility: Ensures smooth integration if your RabbitMQ instances run on Windows.

    By leveraging these default checks and policies, you can maintain reliable RabbitMQ operations, promptly address issues, and optimize messaging infrastructure within your ServiceNow environment.

    Agent Client Collector provides the following default checks and policies for RabbitMQ health monitoring. You must perform RabbitMQ discovery before executing the checks. RabbitMQ checks are available only in a Windows environment.

    Table 1. RabbitMQ Events policy
    Type Check Description Command
    Event check-rabbitmq-alive Verifies whether the RabbitMQ server is alive, using the REST API. If the server is down, an alert triggers. check-rabbitmq-alive.rb --host {{.labels.params_host}} --port {{.labels.params_port}} -v {{.labels.params_vhost}}
    Event check-rabbitmq-cluster-health Verifies whether the RabbitMQ server's cluster nodes are running. If the nodes are down, an alert triggers. check-rabbitmq-cluster-health.rb --host {{.labels.params_host}} --port {{.labels.params_port}}
    Event check-rabbitmq-consumers Verifies the number of consumers on the RabbitMQ server and triggers an alert based on the configured threshold. check-rabbitmq-consumers.rb {{if .labels.params_warn}} --warn {{.labels.params_warn}} {{end}} {{if .labels.params_critical}} --critical {{.labels.params_critical}} {{end}} --host {{.labels.params_host}} --port {{.labels.params_port}}
    Event check-rabbitmq-messages Verifies the total number of messages queued on the RabbitMQ server and triggers an alert based on the threshold. check-rabbitmq-messages.rb --critical {{.labels.params_critical}} --port {{.labels.params_port}} --warn {{.labels.params_warn}} --host {{.labels.params_host}}
    Event check-rabbitmq-network-partitions Verifies whether the RabbitMQ network partition has occurred and triggers an alert based on the threshold. check-rabbitmq-network-partitions.rb --host {{.labels.params_host}} --port {{.labels.params_port}}
    Event check-rabbitmq-node-health Verifies whether the RabbitMQ server node is in a running state.
    check-rabbitmq-node-health.rb --host {{.labels.params_host}} {{if .labels.params_watchalarms}} --alarms {{.labels.params_watchalarms}} {{end}} {{if .labels.params_socketwarn}} --swarn {{.labels.params_socketwarn}} {{end}} {{if .labels.params_memcrit}} --mcrit {{.labels.params_memcrit}} {{end}} {{if .labels.params_fdcrit}} --fcrit {{.labels.params_fdcrit}} {{end}} {{if .labels.params_socketcrit}} --scrit {{.labels.params_socketcrit}} {{end}} --port {{.labels.params_port}} {{if .labels.params_memwarn}} --mwarn {{.labels.params_memwarn}} {{end}} {{if .labels.params_fdwarn}} --fwarn {{.labels.params_fdwarn}} {{end}}
    Event check-rabbitmq-node-usage Checks and displays usage of the RabbitMQ server node.
    check-rabbitmq-node-usage.rb {{if .labels.params_procwarn}} --pwarn {{.labels.params_procwarn}} {{end}} --port {{.labels.params_port}} {{if .labels.params_socketwarn}} --swarn {{.labels.params_socketwarn}} {{end}} --type {{.labels.params_type}} {{if .labels.params_diskcrit}} --dcrit {{.labels.params_diskcrit}} {{end}} {{if .labels.params_fdcrit}} --fcrit {{.labels.params_fdcrit}} {{end}} {{if .labels.params_proccrit}} --pcrit {{.labels.params_proccrit}} {{end}} {{if .labels.params_diskwarn}} --dwarn {{.labels.params_diskwarn}} {{end}} {{if .labels.params_socketcrit}} --scrit {{.labels.params_socketcrit}} {{end}} --host {{.labels.params_host}} {{if .labels.params_memcrit}} --mcrit {{.labels.params_memcrit}} {{end}} {{if .labels.params_fdwarn}} --fwarn {{.labels.params_fdwarn}} {{end}} {{if .labels.params_memwarn}} mwarn {{.labels.params_memwarn}} {{end}}
    Event check-rabbitmq-queue-drain-time Verifies the time it will take for each queue on the RabbitMQ server to drain, based on the current message exit rate.

    For example, if a queue has 1,000 messages in it but only 1 message exits per second, an alert generates because the default critical level of 360 seconds has been exceeded.

    check-rabbitmq-queue-drain-time.rb --host {{.labels.params_host}} --port {{.labels.params_port}} --warn {{.labels.params_warn}} --critical {{.labels.params_critical}}
    Event check-rabbitmq-queues-synchronised Verifies that all mirrored queues with secondary queues are synchronised. check-rabbitmq-queues-synchronised.rb --host {{.labels.params_host}} --port {{.labels.params_port}}
    Event check-rabbitmq-stomp-alive Verifies whether the RabbitMQ server is alive and responding to STOMP. check-rabbitmq-stomp-alive.rb --host {{.labels.params_host}} --queue {{.labels.params_queue}} --port {{.labels.params_port}}
    Table 2. RabbitMQ Metrics policy
    Type Check Description Command
    Metric metrics-rabbitmq-overview Provides RabbitMQ overview statistics. metrics-rabbitmq-overview.rb --port {{.labels.params_port}} --host {{.labels.params_host}}
    Metric metrics-rabbitmq-queue Provides RabbitMQ metrics per queue. metrics-rabbitmq-queue.rb --port {{.labels.params_port}} --host {{.labels.params_host}} {{if .labels.params_vhost}} --vhost {{.labels.params_vhost}} {{end}}