Outage record best-practice?

davidmcdonald
Kilo Guru

Gday all.

I'd like to know more about how some organisations maintain outage records.

From what I understand, outage records typically represent an outage or impact to a service, planned or unplanned, with a target audience from the IT department out to the end user. There are also a number of nifty ways that ServiceNow uses and displays this information, such as in the CMDB Heath Dashboard where you can see upcoming and historical impacts to a CI, and the Service Portal "Current Status" page which displays the current status of services.

The out-of-the-box way that outage records are maintained is that it's all manual, and someone needs to personally take action and create the outage record. Docs pages are here and here.

How do you work with them? What do you consider best-practice for outages?

  • Do you utilise outages in ServiceNow?
  • Who is responsible for maintaining outages?
    Is there a specific someone or a group in your organisation responsible for maintaining outage records, or do you rely on fulfillers to maintain outage records?
  • Have you created a method of simplifying the creation of outage records?
    E.g. a record producer, or a pop-up on a form.
  • Have you added some functionality so that outage records are created and / or updated automatically?
3 REPLIES 3

Ulrich Jugl
ServiceNow Employee
ServiceNow Employee

Let me share my personal opinion and findings with you, not saying that these are ServiceNow best practice or an advice you should follow at all.

I have seen customers using Outages heavily, but you have to consider that you also get your Business Services and your Portfolio management right before this makes real sense. Nobody will benefit if emailserver001@yourcompany is offline, but everyone understands if you have a warning message on your Service Portal telling that IT is aware of restrictions in email communication and that you are working on it.

From what I have seen with these customers, typically the management of Outages and the proper reporting on the data falls under the responsibility of the Incident Process Manager. It is considered piece of his/her KPIs to keep outages to a minimum and report on them in case there was one.

What I have seen as well is some additional logic to automatically create Outages for P1 incidents, and only leave the manual way for P2 or lower incidents. For a P1 incident it can be as easy as to create the outage automatically with the experienced start date of the incident and then updating the outage record on close/resolution of the P1. That way you have minimal to no work with the most critical outages happening in your organisation. For all other records, you want to make sure you add some logic to make sure, if an Outage was created, that it is also properly closed on Resolution of the incident.

That is some good knowledge, thanks for sharing it Ulrich.

williamsun
Mega Guru

I have to agree with @Ulrich Jugl on both accounts.

Incident usually reports on a specific CI that might affect more than one Service, and not all companies have Service Maps created and updated to determine impact.  Also, an Outage depends on the SLA as well, the incident might have created unavailability for 5 services, but only 3 of those were supposed to be up at that time.

On the other point, as far as automating, it is really tricky.  When does the outage start and end?  Was the incident created immediately when the outage occurred or minutes later?  Did the incident spend 2 hours as a Performance issue before escalating?  When the incident was resolved, do you need any further time added to the incident for additional field services on the client side?

I have always thought that the Outage record needs to be kept manual because its goes within the RCA to determine all details and define the outage start and end, and this is my opinion on who manages the record as well, is whoever manages the RCA, be it the Major Incident Manager, the Service Level Manager or the Service Delivery Manager.