When is a Major Incident a Problem?

ChrisPope
Tera Expert

A common thread I come across on my travels and visiting customers!

 

There seem to be varying schools of thought whereby every Major Incident (a definition in its own right!) is a problem, versus, a more selective approach and not all Major Incidents are a problem. The confusion seems to get only worse when a Post Incident Review or Post Mortem (a.k.a. RCA) is required as a result of the incident that occurred.

 

We all know what the ITIL definition is, and again its only a framework to work from, but interested to see and hear what others have to say about it.

 

Chris

1 ACCEPTED SOLUTION

jeff_allen
ServiceNow Employee
ServiceNow Employee

I will assume that this question arises as an organization is implementing a comprehensive service management approach and evolving away from overloading the incident process/tool they have been using. In response to the question, I would say that as many organizations expand to Problem, they usually make a judgment call as to whether a major incident is a problem. However, I usually suggest they consider how they want to capture knowledge and perform reporting, too. That usually drives them to embrace using the Problem process and application to handle those aspects along with adding some indicator to incidents to signify a major incident. This gives them flexibility on deciding what is a major incident, while also enabling evolution to a more robust service management environment that includes problem. As they are ready or interested, they can make it more systematic/programmatic.


View solution in original post

4 REPLIES 4

danielbilling
Kilo Guru

From my experience it's often related to the size of the organisations. In larger organisations there is (in most cases) a need to have 1 team focusing only on communication(escalation) and resolving the incident. Is it the most efficient way? i don't believe it is. These ways of working based on a long history and deep cultural behaviour. Problem ticket is created after Incident is resolved and mainly focused on finding root cause. I share your view Chris that there is a lot of confusion (and additional work) doing the major incident review and root cause analyse. Very often it's the same people in both processes.  


When it comes to smaller organisations it becomes a matter of keeping it simple. i always try to find driver for doing something new or change the way things work. Start a problem process in order to find out where you should spend your money or maybe as input for creating a good knowledge storage.


rumagoso
Kilo Contributor

A Major Incident (MI) occurs because of its impact on customer business (like factory stopped or sales not being done). There may be a Problem behind it, but a pressure to solve the incident at (almost) any cost will make MI solving the immediate focus. After it has been solved (or at least temporarily solved via a working workaround - like updating data on a database to make things consistent hence proceed with processing...), then a decision on pursuing a definitive solution can be taken and a Problem will represent that effort.


Daniel is right on. On a larger org incident and problem will tend to have dedicated people/functions and the return will be more clear, on smaller orgs we will need a good reason to separate these different focus processess (KM is a good one!).


The confusion arises due to the fact that both MI and Problem are not Standard but rather Cases (with more or less complexity), so they overlap on people around it, techniques and evidences needed. But MI are for customer-centric immediate effort (that may need a dedicated task force with external experts) and Problem is an investment on removing error from the service provider.


jeff_allen
ServiceNow Employee
ServiceNow Employee

I will assume that this question arises as an organization is implementing a comprehensive service management approach and evolving away from overloading the incident process/tool they have been using. In response to the question, I would say that as many organizations expand to Problem, they usually make a judgment call as to whether a major incident is a problem. However, I usually suggest they consider how they want to capture knowledge and perform reporting, too. That usually drives them to embrace using the Problem process and application to handle those aspects along with adding some indicator to incidents to signify a major incident. This gives them flexibility on deciding what is a major incident, while also enabling evolution to a more robust service management environment that includes problem. As they are ready or interested, they can make it more systematic/programmatic.


ravi1_tandon
Kilo Guru

I would suggest it is all tied to the service that is being impacted against the $ value. The service tier categorizations and the modules/classes tied to the service will define whether any of those being impacted will result in a major incident.



For any organization to adopt a service based approach, it is always advisable to do white board discussion before directly jumping to system and resultant output.



To define the service approach, I would suggest organization to start with CMDB and align all business services and their component in different categories defined on the basis of business impact tied to $ value of the impact.



At least this is what I follow with most of my client implementations.