Walkthrough guide for AIOPS and Metrics 3: Configure Metrics to create Anomaly Alerts and IT Alerts

RemcoLengers · ‎11-07-2024

This is 3rd out of 3 blogs, please find first one here "Walkthrough guide for AIOPS and Metrics: Getting a custom Metric into the ServiceNow MID Server API" and the second one here "Walkthrough guide for AIOPS and Metrics 2: Configure Metric Explorer and AIOps Dashboard"

In the first blog we setup a MID server, created a CI and started ingesting a Metric for that CI. In the second blog we configured Metric Explorer and setup the start of an AIOPS Dashboard.

In this final blog in the series we are going to see for what metric bounds were created by Metric Intelligence. Next we will see how we can configure what happens a metric runs out of its bounds.

1. Add more metrics to collect

To get some more data to work with I added some more metrics to the collection script. You can download the script here. Replace the apikey value with the one for your instance. This script may to collect data for a few days to start to learn the ML results.

2. Review the ML results

In the first blog we have setup the CI view to include the "Metric Time Series Models". We are returning to this view.

You can see the ML Classifiers selected for the metrics, and also see can see not for all metrics a metrics could be selected hence this metrics does not lend itself for bounds calculations. Information about that can be found here. More detailed information and settings on the Metric bounds sensitivity can be found here.

In Metrics -> Insights Explorer you can see the bounds that are calculated for each metric.

Another Metrics example from a different CI with a different set of metrics can be found below for demonstration purposes

3. From Bounds to Anomaly Alerts

By default, all collected metrics have “Bounds” level, meaning upper and lower bounds are calculated, a model is built for the metric, but no anomaly alerts are generated. Carefully consider what metrics you want to use for anomaly detection as too many false positives may lead persons dealing with the Alert to ignore them in the long run.

To understand the following part it is good to understand the order Metric Anomaly -> IT Alert . This can be accomplished with Metric Configuration rules in this section. This defines the actions based on a single metric going out of bounds.

The order Metric Anomaly -> Anomaly Alert(s) -> IT Alert can be accomplished with the Advanced Promotion Engine in the next section. This gives greater control so that IT Alerts will be generated once a number or metrics go out of bounds. For now we will configure what happens to a single Metric.

To setup the configuration on Metric Config Rules

Setup a rule so that the metric(s) you desire to be included are selected by the conditions. But you can select individual metrics by making "Metric Type Id" a selection criteria. When the selecting is correct with the Preview you can right click the top bar and Save. Then use New to determine what the out come needs to be, for the name use "anomaly_detection_action_level" and for the Choice see below.

If you have a metric that you want to directly create IT Alerts for select IT Alerts in the Choice field.

If you want multiple metrics to be anomalous before creating an IT Alert select Anomaly Alerts. Let's go with the latter one and setup that up in the next step.

So now when for 1 or more metrics go out of their bounds Anomaly Alerts are created.

A view of the Anomalies and their severity in the Insight Explorer view

4. From Anomaly Alerts to IT Alerts

In the previous steps we selected "Anomaly Alert" when a Metric Anomaly occurs. With the Advanced Promotion Engine we can determine a more granular approach creating IT Alerts for a CI class. The inputs are a CI filter, the minimum Anomaly alert severity, Number of Anomaly Alerts and the time window in which they need to occur collectively. This allows for a configuration that requires more then 1 metric to be anomalous for a set period of time. Reducing the chance of too many false positives.

Select New

Add the following data fields. Note the field that says it will not create IT Alert for CI's younger then 7 days. CI's younger then 7 days may not have valid data and ML applied to it. Hence the grace period.

Fill out the lowest anomaly Severity taken into consideration

Fill out the minimum Number of Alerts that need to have Anomaly Alerts open

Fill out the Time Window in which those Anomaly Alerts need to coincide

5. Results and outcome

When enough metric anomalies occur to trigger the advanced promotion rule an Alert will be visible in the Service Operations Workspace -> Express List

Opening that Alert ID to view the details

Than view the Metrics tab to see the featured Metrics

Then via the icon in the top right right of each metric, Metric Explorer can be accessed.

This completes the loop of ingesting a custom metric, configuring how to deal with Metric Anomalies, and how it may create IT Alert for the CI class. Setting it up it into the Alert process for operators to support troubleshooting during the Alert/Incident handling phase. This is a good step up in your AIOPS journey.

Notes:

Manually setting a Metric (threshold) rule
For more advanced use cases it is possible to manually specify upper and lower bounds
Beside custom metric ingesting there are other ways to collect metrics, Event Management -> Connector Definitions

And with the use of ACC-M (Monitoring) Service Operations Workspace -> Integration launchpad -> ACC-M

Walkthrough guide for AIOPS and Metrics 3: Configure Metrics to create Anomaly Alerts and IT Alerts

1. Add more metrics to collect

2. Review the ML results

3. From Bounds to Anomaly Alerts

4. From Anomaly Alerts to IT Alerts

5. Results and outcome

Driving optimized outcomes with ServiceNow ITOM Agentic workflows

Choose Your Applicative Credential

From Amateur to Pro: How ServiceNow's Zurich Release Elevates Your AIOps Game