Incident state if no workaround available and problem management has been started

lauraraitti
Kilo Contributor

As a new incident manager, I sometimes encounter situations where there is no workaround for an incident. The IT supplier cites the complexity of the incident and wants to open a problem ticket to investigate the issue. Often, the problem described in the ticket cannot be reproduced. I wonder what the status of the incident should be in this situation? The IT supplier naturally suggests that it should be on hold, but what do you think and what is the best practice according to ITIL4?

 

1 REPLY 1

Itallo Brandão
Tera Guru

Hi @lauraraitti ,

Welcome to the world of Incident Management! This is a classic "tug-of-war" between Incident Managers and IT Suppliers.

The Short Answer: According to ITIL 4 best practices, if the service is not restored and no workaround is available, the Incident should not be hidden or paused simply to protect a supplier's SLA.

Here is a breakdown of how to handle this according to the framework:

1. Incident vs. Problem (Different Goals)

The primary goal of Incident Management is to restore normal service operation as quickly as possible. The goal of Problem Management is to identify the root cause and prevent recurrence.

  • Incident: Reflects the user's pain/outage.

  • Problem: Reflects the technical investigation.

2. The "On Hold" Trap

Suppliers often suggest "On Hold" to stop the SLA clock. However, ITIL 4 emphasizes the Service Value Stream. If you put an incident "On Hold" without a workaround:

  • The business impact is "silenced" in your reports.

  • The user is still unable to work, but the metrics show everything is "fine."

  • You lose the urgency required to find a workaround.

3. Best Practice Recommendation

In a healthy ITIL environment, here is the recommended workflow for your situation:

  • Keep the Incident Active: If there is no workaround, the incident is still an active disruption. It should remain In Progress or move to a specific state like "On Hold - Awaiting Problem" only if your internal policy allows clock-stopping for third-party complexity.

  • Decouple the Records: The Problem record can take weeks to investigate (Complexity/Reproducibility). The Incident, however, stays open until a workaround is found. Once a workaround is applied, the Incident can be resolved, but the Problem remains open until the permanent fix is found.

  • SLA Management: If the supplier cannot reproduce the issue, the Problem remains "under investigation," but the Incident SLA continues to tick. This pressures the supplier to either find a workaround or provide more resources to reproduce the bug.

4. Summary for your Supplier

You should tell your supplier: "The Incident represents the current service unavailability. We will open a Problem record for the root cause investigation, but the Incident will remain open and active until a workaround is provided to the user, as the business impact is still ongoing."

If this guidance helps you navigate your new role as Incident Manager, please mark it as Accepted Solution.

Best regards,
Brandão.