AIOps -Use ML to Automate Diagnostics and Resolution

jasonsmith · ‎01-22-2023

Incident Identification is easy when an impacted user calls in to the Service Desk. It's too late by that time though which is why SvcOps teams apply multiple techniques to identify issues before customer cases are opened. For example, monitoring is used to gauge the health of IT systems. If application service vital signs are bad then a proactive incident can be created before a customer is impacted and opens a case. Incident identified, diagnostics begin, resolution and recovery starts sooner, everyone is happy, right? Not really...being proactive does not necessarily mean that your workflow is optimized.

Proactive Incident creation often means that human resources need to start diagnostic routines.
In order for monitoring to create the Incident in the first place, it needs to know what to look for like excessive CPU or Memory usage. The vital signs being monitored often indicate that application service health looks wrong, but those signals alone do not tell you root cause on their own.
Monitoring vital signs are often missed because they are too noisy or have simply been turned off by an operator as they manually tune the system.
Monitoring rules are created to instruct the system to look for additional known errors. Each of these rules has its own lifecycle and has to be maintained - this can get expensive quick.

Key word alert! "Diagnostics" is mentioned multiple times in this article.

The graphic below represents the Incident process - have a close look at Incident Response. This is the 3rd major stage in Incident management. Even when the Incident is automatically created there are still a few steps that typically involve human intervention. This is where things like Incident Reassignment Count increases, the SLA clock is ticking and many things are potentially happening before initial diagnosis and resolution and recovery begin.

Investigation and Diagnosis is a tricky one and often requires an SME. Using monitoring to open Incidents is of course better than waiting for a customer to open a case. If you always knew exactly what to look for beforehand the mechanism can work quite well. The reality is that it can be very difficult to know what to look for - creating rules to look for known conditions literally means project and human resource cost. This is the sweet spot where AIOps comes in.

Have you ever opened a case? What's the first thing support asks for? The logs! They need them to eventually start diagnostics.

The ServiceNow AIOps platform can automate diagnostic routines on your behalf by correlating log anomalies from disparate systems. These log anomalies are then correlated with other signals like Metrics, Events and Tasks to complete diagnostics so that you can begin resolution and recovery. It is root cause analysis when you did not know what to look for beforehand. Ash Poxon @ https://www.nationalgrid.com/ wrote about this recently - have a look at his post on LinkedIn. He gives a great explanation on how ServiceNow AIOps provides a massive reduction in time to get to resolution and recovery.

Short story is that ServiceNow AIOps automates Incident Response instead of just automating Incident Identification. AIOps transforms the Incident process with Machine Learning to identify Root Cause before Incidents are opened. Tracking "just is" in the ServiceNow platform and the data you need for related KPI reporting is already there. The KPI reports are actually there too. Time to take advantage of information driven outcomes.

AIOps -Use ML to Automate Diagnostics and Resolution

Introducing the ServiceNow ITOM AIOps Configuration Center

Achieve Zero Outages by Attending a 2026 AIOps Workshop! Schedule for Online & In Person Sessions

Learn. Prioritize. Automate. The LEAP Story