ServiceNow Best Practice for major outages

mcr1
Kilo Explorer

We are currently on Jakarta, considering an upgrade to Kingston

What is the Best Practice from ServiceNow on handling an Outage.

 

Definition: Outage

An issue that prevents 25% of a department from functioning properly, or one that directly affects revenue.

 

Currently, we are using Incidents for individual or noncritical issues. For major outages, we are using Problems.

I am aware that this is not the correct way to use them, so I would like to know what is recommended.

 

 

5 REPLIES 5

Uncle Rob
Kilo Patron

Always start with WHY, not HOW.

What outcome are you trying to achieve?
- Identify sources of outages and costs to kill root cause?
- Respond faster?

The WHY of the solution will significantly inspire the HOW.  I've seen no less than 10 "best practice" "major incident management solutions" all die days after go-live... all because the focus was on How, not Why.

mcr1
Kilo Explorer

Primarily documentation and reporting.

 

This will be used so that the correct group (those working on it and the people they report to) can see who is working on it and what is being done. It will also be used to document the RCA/After action report.

 

Additionally it will be used for reporting on these specific types of Outages, separate from Incidents and Problems.

Well, there you go.  None of that necessitates using Problem for outages.

The *best* scenario would be an event management solution that determines the compromised CI's, rolls that up to a business service, and creates a single Incident & Outage record.  Without that level of CMDB/Event maturity, I'd be looking at the manual creation of Outage records after the outage cause was identified.

That'll keep your Incident & Problem management modules "pure" while prepping for greater CDMB / Event maturity down the road.

shruti_tyagi
ServiceNow Employee
ServiceNow Employee

I would say use incidents for raising the outages and use problems to work on root cause analysis for major P1s.

First is to work on providing relief through the incident and second is to work on RCA through PRB

Hope this helps

Shruti