Anomaly detection algorithm
Summarize
Summary of Anomaly detection algorithm
The Instance Observer uses an anomaly detection algorithm based on the Z-score statistical model, a univariate method, to analyze performance metrics. This approach helps identify unusual behavior across key system metrics to maintain optimal operation and resource management.
Show less
Key Features
- Metrics Analyzed: The algorithm evaluates five metrics — Memory Max, Semaphore Mean, SQL Response Time, Server Response Time, and Transaction Count.
- Detection Models:
- Z-score Model: Applied to Transaction Count, Server Response Time, and SQL Response Time to detect anomalies based on standard deviations from historical averages.
- Upper Threshold-Based Methodology: Used for Semaphore Mean, Memory Max, and Job Execution metrics, triggering alerts only when values approach predefined resource exhaustion limits.
- Z-score Calculation: The Z-score compares the current raw metric value (moving average over the past 15 minutes) to the mean and standard deviation calculated from four weeks of historical data matching the same day, hour, and minute.
- Consideration of Cyclic Patterns: The model accounts for natural cyclical data patterns (daily, weekly, seasonal) to avoid false positives by evaluating a cyclicity similarity score over a four-week period, excluding weekends.
Practical Benefits for ServiceNow Customers
- Enables proactive identification of true anomalies in system performance, improving reliability and resource utilization.
- Reduces false alarms by distinguishing between normal cyclical variations and genuine performance issues.
- Combines statistical anomaly detection (Z-score) with threshold-based alerts for critical resource limits, providing a balanced and comprehensive monitoring approach.
- Supports informed operational decisions by providing accurate and context-aware anomaly insights.
Instance Observer is performing anomalies detection through the Z-score Statistical model, otherwise referred to as a univariate method.
Anomaly detection analyzes a set of five metrics, Memory Max, Semaphore Mean, SQL response time, Server Response Time and Transaction count. The detection model has been validated with samplings with multiple instances of daily, weekly, and monthly level data.
Metrics representing anomalies using the Z-score model are Transaction count, Server Response Time & SQL Response time. Metrics representing anomalies using an upper threshold-based approach are Semaphore Mean, Node max Memory, and Job execution. Refer to Getting started with Performance charts for details on the five metrics.
Upper threshold-based methodology
Upper threshold-based methodology uses metrics with an exhausting limit. For example, metric A, which has a semaphore mean value of 14 or 16, which is used on the platform to limit the number of transactions that can occur on a node at one time to protect resources on the node. Metric B, memory max of 2 GB, where each node memory has a pre-defined maximum capacity. In all such similar cases, the situation is alarming only when the metrics are closer to the exhaustion limit. Even if the deviation is higher than the mean, but lower than the exhausting limit, then the threshold limit wouldn’t result in an alarm.
Z-score methodology
A Z-score is a numerical measurement that describes the relationship between a value to the mean of a group of values. Z-score is measured in terms of standard deviations from the mean. If a Z-score is 0, then the data point score is identical to the mean score.
The formula for calculating a Z-score is z = (x-μ)/σ:
- x : The raw score of the data, as the moving average of the previous 15 minutes
- μ: The data population mean that is the average of the previous four weeks on the same day, same hour, and same minute
- σ: The data population standard deviation
The cyclicity score is the similarity between two series which measure the similarity between two vectors and helps ensure that the Z-score model provides reliable insights and identifies true anomalies or outliers while considering the natural patterns of the data.
The cyclical score is calculated at the instance level with a data selection of four weeks divided into two-week vector increments, excluding weekends. The score returns the similarity score between the two, where a higher score indicates a more aligned similarity trend in the compared vector data.