Alerts in Instance Observer

  • Release version: Zurich
  • Updated July 31, 2025
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Alerts in Instance Observer

    ServiceNow Instance Observer offers a comprehensive alert system designed to monitor the health, performance, and user experience of your platform. These alerts help you proactively identify and address issues across various components including transactions, infrastructure, database, email processing, job execution, user activity, event handling, messaging, data growth, and application health. The alerts are categorized for clarity and actionable insights, enabling efficient platform management.

    Show full answer Show less

    Key Features

    • Transaction Monitoring: Detect anomalies such as transaction volume spikes or drops at both instance and node levels, response time degradations system-wide and per node, database latency, and slow queries impacting responsiveness.
    • Node Health Tracking: Monitor CPU usage, memory consumption, and JVM garbage collection delays on nodes and load balancer containers to prevent bottlenecks and failures.
    • Database Performance and Health: Alerts on CPU issues across primary, shard, and read replica hosts, replication lag, row lock waits, and abnormal growth at database and table levels to ensure data reliability and performance.
    • Email Processing: Detect delays or failures in outbound email delivery and issues in inbound email ingestion, ensuring timely communication flows.
    • Scheduler and Job Execution: Identify stuck schedulers, long-running jobs, and abnormal thread activity to maintain job lifecycle efficiency.
    • User Session and Activity Monitoring: Track login activities at instance and node levels to understand user behavior and session distribution.
    • Event Queue and Semaphore Management: Monitor semaphore wait times, queue depths, and backlog in event queues to aid in debugging event handling and throttling.
    • Asynchronous Messaging Bus (AMB): Provide observability into internal messaging by tracking outgoing message queue size and utilization for real-time app behavior insights.
    • Historical/List Data Volume Monitoring: Flag excessive growth in history or list tables that may degrade performance.
    • Application Host Health: Keep track of CPU overload conditions at the application layer to prevent service degradation.
    • AI/ML-Based Intelligent Alerts: Utilize AI-driven anomaly and pattern detection alerts (Auriga Intelligent) to identify complex or emerging issues proactively.

    Key Outcomes

    • Proactive identification of performance degradations and infrastructure bottlenecks across the entire ServiceNow platform.
    • Enhanced visibility into transaction health, database reliability, and user activity trends to support operational decision-making.
    • Improved email communication reliability through timely detection of inbound and outbound email issues.
    • Efficient job execution monitoring to reduce scheduler and job-related failures or delays.
    • Better event and messaging queue management, reducing the risk of processing backlogs and system slowdowns.
    • Data growth insights to manage storage and performance impacts effectively.
    • Use of AI/ML techniques to surface subtle or complex anomalies that traditional monitoring might miss.

    ServiceNow Instance Observer provides a comprehensive set of alerts designed to monitor platform health, performance, and user experience. These alerts are categorized for easy consumption and actionability.

    Transactions
    Monitors application transactions for anomalies, spikes, or degradations in performance such as:
    • Transaction Decrease: Detects a drop in total transaction volume
    • Transaction Decrease Node: Identifies transaction volume drop per node
    • Transaction Increase: Flags unexpected transaction surges
    • Transaction Increase Node: Highlights node-level transaction spikes
    • Response Time: Triggers when system-wide response time increases
    • Response Time Node: Flags nodes with degraded response times
    • Database Response Time: Monitors database-level latency impacting transactions
    • Slow Queries Per Second: Identifies the volume of slow database queries affecting responsiveness
    Node health (CPU, memory, or garbage collection)
    Tracks node infrastructure health to avoid bottlenecks or failures:
    • Node CPU time: High CPU usage alert for a node
    • Node memory: Monitors memory consumption patterns
    • Node garbage collection time: Tracks JVM GC delays
    • Load balancer container CPU utilization: Flags CPU overload on LB containers
    • Load balancer container memory utilization: Detects memory exhaustion on LB containers
    Database performance and health
    Covers critical database indicators to verify query health and data reliability:
    • Database host health CPU: High CPU on primary DB host
    • Shards host health CPU: Resource issues on shard hosts
    • Read replica host health (CPU): Read-replica CPU anomalies
    • Standby replication lag: Lag in standby DB replication
    • InnoDB row lock: Frequency of row lock waits
    • Primary database growth: Flags abnormal growth in primary DB
    • Database table growth: Specific table-level growth indicators
    Inbound and outbound email
    Promotes timely delivery and ingestion of email-based communications:
    • Outbound email: Delays or failures in outbound email processing
    • Inbound email: Issues in ingesting incoming emails
    Scheduler and job execution
    Helps detect issues in the job execution life cycle:
    • Scheduler stuck: Scheduler not progressing or blocked
    • Long-running jobs: Jobs exceeding typical run time
    • Specific long-running jobs: Custom job monitoring
    • Thread running: Threads running unusually long or in high volume
    Session and user activity
    Tracks user login behavior across instance and nodes:
    • User session logged in – Instance: Log in activity across instance
    • User session logged in – Node: Node-wise session metrics
    Event queue and semaphore management
    Critical for debugging platform event handling and job execution throttling:
    • Default semaphore mean: Semaphore wait time trends
    • Default semaphore QDepth: Depth of queued semaphore requests
    • Integrated semaphore: Monitors integrated semaphore contention
    • Event queue check: Tracks backlog in event queues
    • Specific queue for events: Custom event queue monitoring
    • High priority event queue: Monitors mission-critical event queues
    • ECC queue: External communication channel backlog alerts
    Asynchronous Messaging Bus (AMB)
    Internal messaging bus observability for real-time app behavior:
    • AMB send queue depth: Size of outgoing message queue
    • AMB send in use: Utilization of AMB sending capacity
    Historical or list data volume
    Monitors growth of historical or list data that can impact performance:

    History list length: Flags excessive record count in history tables.

    Application host health
    Monitors health at the application layer:

    Application host health CPU: Application-tier CPU overload alerts.

    AI/ML or intelligent alerting
    Includes alerts generated via AI/ML-based behavior analysis:

    Auriga Intelligent: AI-driven anomaly or pattern detection alerts.