Alerts in Instance Observer
Summarize
Summary of Alerts in Instance Observer
ServiceNow Instance Observer offers a comprehensive alerting system designed to monitor your platform’s health, performance, and user experience. Alerts are categorized to simplify identification and response to issues across transactions, infrastructure, database health, email processing, job execution, user activity, event handling, messaging, historical data growth, application health, and intelligent AI/ML-based anomaly detection.
Show less
Key Features
- Transaction Monitoring: Detects anomalies like volume drops or spikes, response time degradations at system and node levels, and slow database queries impacting performance.
- Node Health Tracking: Monitors CPU, memory, and garbage collection metrics on nodes and load balancer containers to prevent infrastructure bottlenecks or failures.
- Database Performance: Observes CPU usage on database hosts, replication lag, row locking, and abnormal data growth to ensure data reliability and query health.
- Email Processing Alerts: Identifies delays or failures in inbound and outbound email flows critical for communication.
- Scheduler and Job Execution: Flags stuck or long-running jobs and threads to maintain smooth job lifecycle operation.
- User Activity Monitoring: Tracks user login sessions across the instance and individual nodes for security and performance insights.
- Event Queue and Semaphore Management: Provides visibility into event backlog and semaphore contention affecting platform responsiveness.
- Asynchronous Messaging Bus (AMB): Monitors message queue sizes and utilization to ensure real-time application messaging health.
- Historical/List Data Volume: Alerts on excessive data growth in history tables that could degrade performance.
- Application Host Health: Tracks CPU utilization at the application host layer.
- AI/ML Intelligent Alerts: Leverages AI-driven analysis to detect behavioral anomalies and patterns beyond predefined thresholds.
- Alerts Activation and Notification Management: Enables flexible configuration of alerts with historical thresholds and targeted team notifications to align with business needs.
- Integration Capabilities: Supports routing alert notifications to ServiceNow instances and third-party systems via configurable JSON payloads, emails, and SMS for seamless incident management.
Practical Use for ServiceNow Customers
By leveraging Instance Observer alerts, ServiceNow customers can proactively monitor critical aspects of their platform, quickly detect and address performance degradations, infrastructure issues, and user experience anomalies. The alert configuration options allow tailoring alert thresholds and notifications to fit organizational priorities and operational workflows. Integration with ServiceNow and external systems ensures alerts are actionable within existing ITSM and monitoring environments, enhancing incident response efficiency.
Customers can prioritize alerts like average application response time, long pending jobs by priority, and other popular alerts to maintain optimal instance performance. Notifications appear prominently on the Instance Observer banner, providing immediate visibility to your teams.
ServiceNow Instance Observer provides a comprehensive set of alerts designed to monitor platform health, performance, and user experience. These alerts are categorized for easy consumption and actionability.
- Transactions
- Monitors application transactions for anomalies, spikes, or degradations in performance such as:
- Transaction Decrease: Detects a drop in total transaction volume
- Transaction Decrease Node: Identifies transaction volume drop per node
- Transaction Increase: Flags unexpected transaction surges
- Transaction Increase Node: Highlights node-level transaction spikes
- Response Time: Triggers when system-wide response time increases
- Response Time Node: Flags nodes with degraded response times
- Database Response Time: Monitors database-level latency impacting transactions
- Slow Queries Per Second: Identifies the volume of slow database queries affecting responsiveness
- Node health (CPU, memory, or garbage collection)
- Tracks node infrastructure health to avoid bottlenecks or failures:
- Node CPU time: High CPU usage alert for a node
- Node memory: Monitors memory consumption patterns
- Node garbage collection time: Tracks JVM GC delays
- Load balancer container CPU utilization: Flags CPU overload on LB containers
- Load balancer container memory utilization: Detects memory exhaustion on LB containers
- Database performance and health
- Covers critical database indicators to verify query health and data reliability:
- Database host health CPU: High CPU on primary DB host
- Shards host health CPU: Resource issues on shard hosts
- Read replica host health (CPU): Read-replica CPU anomalies
- Standby replication lag: Lag in standby DB replication
- InnoDB row lock: Frequency of row lock waits
- Primary database growth: Flags abnormal growth in primary DB
- Database table growth: Specific table-level growth indicators
- Inbound and outbound email
- Promotes timely delivery and ingestion of email-based communications:
- Outbound email: Delays or failures in outbound email processing
- Inbound email: Issues in ingesting incoming emails
- Scheduler and job execution
- Helps detect issues in the job execution life cycle:
- Scheduler stuck: Scheduler not progressing or blocked
- Long-running jobs: Jobs exceeding typical run time
- Specific long-running jobs: Custom job monitoring
- Thread running: Threads running unusually long or in high volume
- Session and user activity
- Tracks user login behavior across instance and nodes:
- User session logged in – Instance: Log in activity across instance
- User session logged in – Node: Node-wise session metrics
- Event queue and semaphore management
- Critical for debugging platform event handling and job execution throttling:
- Default semaphore mean: Semaphore wait time trends
- Default semaphore QDepth: Depth of queued semaphore requests
- Integrated semaphore: Monitors integrated semaphore contention
- Event queue check: Tracks backlog in event queues
- Specific queue for events: Custom event queue monitoring
- High priority event queue: Monitors mission-critical event queues
- ECC queue: External communication channel backlog alerts
- Asynchronous Messaging Bus (AMB)
- Internal messaging bus observability for real-time app behavior:
- AMB send queue depth: Size of outgoing message queue
- AMB send in use: Utilization of AMB sending capacity
- Historical or list data volume
- Monitors growth of historical or list data that can impact performance:
History list length: Flags excessive record count in history tables.
- Application host health
- Monitors health at the application layer:
Application host health CPU: Application-tier CPU overload alerts.
- AI/ML or intelligent alerting
- Includes alerts generated via AI/ML-based behavior analysis:
Auriga Intelligent: AI-driven anomaly or pattern detection alerts.