Alerts in Instance Observer
Summarize
Summary of Alerts in Instance Observer
ServiceNow Instance Observer offers a robust alert system that continuously monitors the health, performance, and user experience of your platform. These alerts are organized into categories to simplify monitoring and enable prompt, targeted responses to potential issues.
Show less
Key Features
- Transactions: Detects anomalies such as drops or surges in transaction volume, degraded response times at system and node levels, and database-related latency issues like slow queries that affect responsiveness.
- Node Health: Monitors critical node infrastructure metrics including CPU usage, memory consumption, and garbage collection delays to prevent performance bottlenecks or failures.
- Database Performance and Health: Tracks CPU load on primary and shard database hosts, replication lag, row lock frequencies, and abnormal growth patterns in databases and tables to ensure data reliability and query efficiency.
- Email Processing: Alerts on delays or failures in both outbound and inbound email handling, supporting timely communication workflows.
- Scheduler and Job Execution: Identifies issues such as scheduler blocks, long-running jobs, and unusual thread activity to maintain smooth job lifecycle execution.
- Session and User Activity: Provides insights into user login patterns at both instance and node levels to monitor user engagement and detect potential anomalies.
- Event Queue and Semaphore Management: Facilitates debugging by monitoring semaphore wait times, queue depths, and backlog in critical event queues, including mission-critical and external communication channels.
- Asynchronous Messaging Bus (AMB): Observes real-time internal messaging behavior by tracking outgoing message queue sizes and utilization rates.
- Historical or List Data Volume: Flags excessive record counts in history or list tables that may impact system performance.
- Application Host Health: Alerts on CPU overload at the application layer to help maintain application stability.
- AI/ML or Intelligent Alerting: Utilizes AI-driven analysis to detect anomalies and patterns, providing proactive performance insights.
Key Outcomes
By leveraging Instance Observer alerts, ServiceNow customers can proactively identify and address performance degradations, infrastructure bottlenecks, and user activity anomalies. This comprehensive monitoring enables faster troubleshooting, improved platform reliability, and enhanced user experience, ensuring operational continuity and optimal service delivery.
ServiceNow Instance Observer provides a comprehensive set of alerts designed to monitor platform health, performance, and user experience. These alerts are categorized for easy consumption and actionability.
- Transactions
- Monitors application transactions for anomalies, spikes, or degradations in performance such as:
- Transaction Decrease: Detects a drop in total transaction volume
- Transaction Decrease Node: Identifies transaction volume drop per node
- Transaction Increase: Flags unexpected transaction surges
- Transaction Increase Node: Highlights node-level transaction spikes
- Response Time: Triggers when system-wide response time increases
- Response Time Node: Flags nodes with degraded response times
- Database Response Time: Monitors database-level latency impacting transactions
- Slow Queries Per Second: Identifies the volume of slow database queries affecting responsiveness
- Node health (CPU, memory, or garbage collection)
- Tracks node infrastructure health to avoid bottlenecks or failures:
- Node CPU time: High CPU usage alert for a node
- Node memory: Monitors memory consumption patterns
- Node garbage collection time: Tracks JVM GC delays
- Load balancer container CPU utilization: Flags CPU overload on LB containers
- Load balancer container memory utilization: Detects memory exhaustion on LB containers
- Database performance and health
- Covers critical database indicators to verify query health and data reliability:
- Database host health CPU: High CPU on primary DB host
- Shards host health CPU: Resource issues on shard hosts
- Read replica host health (CPU): Read-replica CPU anomalies
- Standby replication lag: Lag in standby DB replication
- InnoDB row lock: Frequency of row lock waits
- Primary database growth: Flags abnormal growth in primary DB
- Database table growth: Specific table-level growth indicators
- Inbound and outbound email
- Promotes timely delivery and ingestion of email-based communications:
- Outbound email: Delays or failures in outbound email processing
- Inbound email: Issues in ingesting incoming emails
- Scheduler and job execution
- Helps detect issues in the job execution life cycle:
- Scheduler stuck: Scheduler not progressing or blocked
- Long-running jobs: Jobs exceeding typical run time
- Specific long-running jobs: Custom job monitoring
- Thread running: Threads running unusually long or in high volume
- Session and user activity
- Tracks user login behavior across instance and nodes:
- User session logged in – Instance: Log in activity across instance
- User session logged in – Node: Node-wise session metrics
- Event queue and semaphore management
- Critical for debugging platform event handling and job execution throttling:
- Default semaphore mean: Semaphore wait time trends
- Default semaphore QDepth: Depth of queued semaphore requests
- Integrated semaphore: Monitors integrated semaphore contention
- Event queue check: Tracks backlog in event queues
- Specific queue for events: Custom event queue monitoring
- High priority event queue: Monitors mission-critical event queues
- ECC queue: External communication channel backlog alerts
- Asynchronous Messaging Bus (AMB)
- Internal messaging bus observability for real-time app behavior:
- AMB send queue depth: Size of outgoing message queue
- AMB send in use: Utilization of AMB sending capacity
- Historical or list data volume
- Monitors growth of historical or list data that can impact performance:
History list length: Flags excessive record count in history tables.
- Application host health
- Monitors health at the application layer:
Application host health CPU: Application-tier CPU overload alerts.
- AI/ML or intelligent alerting
- Includes alerts generated via AI/ML-based behavior analysis:
Auriga Intelligent: AI-driven anomaly or pattern detection alerts.