ITIL Process Question regarding incident closure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2014 08:38 AM
Good Morning,
This morning we encountered a Priority 1 Incident for a whole office losing their network connection. The cause was quickly determined to be a broken UPS. Once they bypassed the UPS, the network connection issue was resolved and the office was running again. The UPS however still needs to be replaced.
Should the incident be resolved when the immediate issue was fixed and network connections resumed? If so, where and how is the replacement of the UPS swap tracked? According to ITIL V3 it's still an incident: Failure of a Configuration Item that has not yet impacted Service is also an Incident. Is that logged as a new incident? That would make SLA's happy. Would creating a new incident confuse the end-user?
I'm curious how you would handle this situation in your environment or what you would consider the best or proper process to be.
Thanks,
Scott
- Labels:
-
Service Mapping
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2014 08:57 AM
Hello Scott,
This is where Problem Management comes into play. According to ITIL, the focus of Incident Management is to get the customer up and running as quickly as possible. Once that happens you have to, organizationally (process), figure out whether to leave that Incident open or to close it. The focus of Problem Management would be to figure out what the root cause of the Incident or failure was and to put something in place to ensure that failure doesn't repeat itself, ever again. Lofty goal huh?
So here's where things get sticky. Many organizations would like to close that Incident as quickly as possible in order to meet their SLAs. So you have to decide whether to keep that Incident open until the Problem Records is closed or close it prior to the Problem Record being closed. You also have to take into consideration how that affects your metrics and reporting. And last but not least, how that affects your improvement efforts. So it takes some thought. The Incident Manager is not going to want that Incident to stay open until the Problem Record is resolved.
Not sure where you got that "Failure of a Configuration Item that has not yet impacted Service is also an Incident" definition. If that's true there's no such thing as Proactive Problem Management. The ITIL V3 definition of an Incident is "An unplanned interruption to an IT Service or a reduction in the Quality of an IT
Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident. For example, Failure of one disk from a mirror" set. We can have a discussion about what happens if a server's case is vibrating or a tech note comes out about a potential bug in the firmware of a drive in the raid array that has your company's payroll application on it at some other time if you like.
ITIL doesn't necessarily mean that's the way things should be done. ITIL is not prescriptive so you have to consider all of the real world scenarios, how you are organizationally structured, what your processes and procedures are, what policy dictates, etc.
You see...in theory, theory and practice are the same but in practice they are not. I think Einstein said that.
Here's a document you may be able to use. It has all of the ITIL processes, proces steps, embedded Visio diagrams, CSFs, KPIs, etc. I think I may even post this document on the community at some point. Feel free to download and use at your discretion.
ITSM Process Interface Document (PID)
https://app.box.com/s/wz4lgmxjxazjm3i575xk
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2014 09:41 AM
Scott,
You have a little flexibility; however, the original incident should be closed, you implemented a workaround, but an RFC should be opened to accommodate the UPS replacement.
The aim of Incident Management is to restore the service as quickly as you can and that will often mean that a work around or temporary fix is employed rather than the completion of a more permanent fix.
If you needed to make a configuration change before implementing the workaround then it would be appropriate to keep the incident open pending the change. Your case does not seem to be that sort of a situation.
The quote you listed is referencing a situation where you may have discovered an issue that could impact services . . . as in, you had a server that is constantly running at 90+% of capacity and appears to be near failure or maybe someone walked through your data center and placed their hand on a UPS and noticed that it was overheating . . . even if it hadn't failed yet it could still be considered an incident.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2014 10:10 AM
Seems to me like only one refrigerator fell in your case...
ITIL Refrigerator by Pavlo Rudenko on Prezi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2014 10:17 AM
Nice breakdown by Pavlo. Thanks for sharing.
.
.
.
I'm going to steal it for user training . . .