Question about Major Incident VS Problem Management

zeba1
Kilo Contributor

We are just configuring Service Now and going live in the next month, I am being told that Parent/Child feature in Incident Management is ONLY to be used if we are not implementing Problem Management. I am being advised that because I am implementing Problem management every time there is a Major Incident I should automatically create a problem record and link all the incidents to the problem and not use Parent/child relationship to manage outages (multiple calls caused by the same incident).
Can someone please tell me if Parent/Child feature is only if we do not implement Problem Managment or should we use both to manage Major incidents and problems the way they should be. I would like to use both so I can manage Major Incidents for Incident Management and if its a reoccuring issue then open a problem ticket to investigate and do root cause analysis and prevent future occurence.

Thank you,

8 REPLIES 8

Mark Stanger
Giga Sage

You can use both and I would recommend that you do. Major incidents and problems are 2 separate things with 2 separate processes. The situation you describe is the perfect reason to use both.


troyp
Giga Contributor

They are clearly intertwined with different goals and processes. I also expect that knowledge management will be involved to capture and communicate status/progress.

Anyone done implementing major incident/problem/knowledge and want to share the wealth?


Mike Malcangio
ServiceNow Employee
ServiceNow Employee

I actually just responded to a similar question here:

Major Incident Process

Here are my thoughts:

I've seen this done a few different ways in the tool -- as Matt alluded too -- it's really dependent on your process.

I will say this -- ITIL v3 -- isn't very prescriptive in this case. It says there should be a major incident process; it doesn't really specify what that should be. This leads to all kinds of religious arguments (go search for the same topic on LinkedIn) about the process and thus different implementations in the product. I think it speaks strongly about the flexibility of the tool that it will support whatever you would like to do.

The two major camps:

Major Incident is a "kind" of incident. Perhaps it's when the urgency and impact are both set to High and therefore you have a 'Critical' incident. You may then kick in emails to certain teams based on it being a "critical" incident and have different SLAs for it, etc. In the background, you may have a crisis communication plan.

This naturally leads to another question, and the genesis of the second camp -- if the "major incident" is affecting multiple customers -- how do I capture that in the tool?

To tackle this some folks create a new task type called "major incident" and then build a parent child relationship between major incidents and "plain old" incidents. This allows them to group and collect the incidents and then close and notify in mass. The nice thing about this approach is it affords you a lot of easy customization opportunities for how you handle "major incidents" vs. "plain old" incidents.

Other folks, rather than build a separate task type, leverage Problem for the same function. The argument being that most of the functionality all ready exists to do this in problem (associating multiple incidents to a single problem) and 99% of the time you're going to create a problem to get to the root cause of the major incident anyway. Might as well just build crisis handling into the problem workflow and then once the fire is out, let the problem proceed down the normal path of digging into the root cause of the problem.

The objection to the latter is that incident and problem processes have competing goals. The incident process wants to restore service as quickly as possible; the problem process wants to resolve the root cause of the issue so it never happens again. By placing an aspect of incident in the problem space you're running against the goals of the process.

As I said -- neither is really wrong -- ITIL is vague in what it tells you to do and the system will allow you to do either.


Mike,

I just wanted to send you a quick response. Your thoughts helped us think through some tradeoffs.

We've opted to manage major incidents with a problem record. Then we'll use that as the parent for any incidents suffering from the same "problem". We have built it on the Task_rel_task relationships and updated the "resolve children" and "communicate workout" to pull the right relationships to incidents.

Time will tell, but I like how it is built out.

The problem has some flavor of both incident (restore service) and problem (determine root cause and fix). But that's handled in the state, work around, and root cause information on the record.

-Troy