ITIL Problem Management Best Practices

Prasant Kumar 1 · ‎04-30-2020

Problem Management (PM) is one of the components in the ITIL Service Operations area. The primary focus of PM is to identify causes of service issues and commission corrective work to prevent recurrences.

PM processes are both reactive and proactive - reactive in solving problems in response to incidents, and proactive in identifying and solving potential incidents before they occur.

Step 1: Define Roles and Responsibilities

There should be a designated Problem Manager whose responsibility is to identify problems during daily operations as well as through historical reporting that shows recurring incidents.Depending on the size of your organization, this may not be a full time job, but is a necessary role.

Additionally, the Service Desk Manager should be in direct communication with the Problem Manager, as he or she will likely be the first alerted when a cluster of Critical or High Priority incidents are opened.
The primary objectives of Problem Management are:
1) To uncover a diagnosis of the root cause of the problem
2) To provide either a temporary fix or workaround to the problem
3) To control the error by leaving the fix in place or permanently repairing the condition

Step 2: Focus on Root Cause

Create a documented process for Root Cause Analysis that describes what techniques will be used. These can include brainstorming, Causal Mapping or any other technique that successfully uncovers the underlying cause.
This process should be “group think”, and the group composed of representatives from any possible area of breakdown.

Step 3: Make a “Known Error” Known

Once a root cause and a workaround are in place, a problem becomes a “known error.” The workaround should be communicated to all end-users who have submitted an incident and the incidents placed in a “resolved” status.

The Problem record should be in a “known error” status. Additionally, the
known error and workaround should be published to the knowledge base for resolution at the Service Desk.

Continue to open related incidents as reported and link them to the problem
record, but if the published workaround has been implemented with the end-user, the newly related incidents should be in a “resolved” state. This should stop SLA calculations against the incidents, but will not allow full closure until the problem is resolved and closed. Once the environment has calmed down
and productivity restored to the end-users through the workaround, Problem Managers must decide if permanently fixing the root cause is economically viable or if the workaround should become permanent.

Step 4: Weigh the ROI

If the return on investment (ROI) for repairing a root cause will not be achievable in six months, consider leaving the workaround in place.

If the repair of the root cause is feasible or necessary regardless of length of ROI, the Problem Manager and assignee may have to initiate a Request for Change (RFC).

This change record is governed by the Change Management process and the same way incidents are linked to problems, a problem should be linked to the RFC.
When the RFC is successfully implemented and closes, it will in turn allow the Problem record to be closed and any associated
incidents will be closed.

Step 5: Focus on Root Cause

Don’t automatically close Problem records when an RFC is complete.
They should be reviewed by the Problem Manager to assure that any workaround in place is backed out, if necessary, in order to effectively
use the changed configuration item. Additionally, this allows for total
contact ownership and customer satisfaction.

Step 6: Be Customer-centric

Focus on customers, not infrastructure. The tendency is to focus on the most troublesome infrastructure. However, the goal of effective IT Service Management is to focus on customers. To this end, Problem Managers should sort recurring incidents by line of business and address the business unit with the most issues.

Thanks & Regards

Prasant kumar sahu

Donny Techiera · ‎07-18-2024

Greetings,

Thanks for this detail explanation was really insightful..

lisarenner · ‎03-05-2025

For our P1/P2 incidents we create a problem ticket where we track RCA and preventative tasks. Is there a template for an RCA that we can tie to a problem ticket, so we can build ServiceNow reports on RCAs? Where would I find more information on documenting RCAs in ServiceNow? I see that there is a Root Cause Analysis field in the Analysis Information tab of the problem, but our RCA document is much more extensive and we are currently documenting those in a word document and attaching them.

Mohandsaidi · ‎01-15-2026

merci pour ce travail une aide précieuse pour debutant

DEWeinfu · ‎01-26-2026

What is the answer to this question? I too would like to explore using the problem module for two different ticket types, one for RCA and the other for defined, prioritized problems (to manage corrective and preventative actions).