Problem Management process In IT industries

Sanjay Bagri1 · ‎01-03-2019

What is problem management?

Problem management is one aspect of ITIL implementation that gives many organizations headaches. The difficulty lies in the similarity between incident management and problem management. The two processes are so closely aligned that differentiating the activities can become difficult for ITIL novices. At what point does one turn into the other? In some organizations, the two processes aret so closely related they are combined altogether. The differences are important, however, since they are not the same and have different objectives.

The term “problem” refers to the unknown cause of one or more incidents. A useful metaphor for understanding the relationship between problems and incidents is to think of the relationship between a disease and its symptoms. In this metaphor, the disease is the problem and the symptoms are the incidents. Just as a doctor uses the symptoms to diagnose the disease, so problem management uses the incidents to diagnose the problem.

Problem management’s first activity is to diagnose the problem and validate any workarounds. Problem management uses a problem database to track problems and to associate any identified workarounds with them. Once the problem has been diagnosed and a workaround identified, the problem is referred to as a “known error.” These are documented in the known error database (KEDB), which may be the same physical database as the problem database. The KEDB is a significant tool for incident management in resolving incidents caused by known errors.

After the known error has been identified, the next step is to determine how to fix it. This will typically involve a change to one or more CIs, so the output of the problem management process would be a request for change, which would then be evaluated by the change management process, or included in the CSI register.

Problem management is thought of as a reactive process in that it is invoked after incidents have occurred, but it is actually proactive, since its goal is to ensure that incidents do not recur in the future, or if they do, to minimize their impact.

The purpose of problem management

When users continue to face the same incidents without resolution, they lose trust in the service desk’s ability to resolve any problem. Hence the primary objective of problem management is to identify, troubleshoot, document, and resolve the root causes of repeated incidents. Incident information filters up to problem management and problem management, in turn, provides the service desk with the known error and workaround information necessary to mitigate problems in the short term.

Problems include issues such as failing hardware or an inadequately configured database query. Problem management reduces incidents over the long term. Incident reduction decreases the load on the service desk, improves end-user satisfaction, and decreases the long-term costs associated with user and service downtime. When problems cannot be resolved, problem management works with the service desk to mitigate the impact of the related incidents. The end goal of problem management should always be to reduce the overall quantity of preventable incidents and thereby increase the quality of service provided.

The scope of problem management

Problem management has a very limited scope and includes the following activities:

Problem detection
Problem logging
Problem categorization
Problem prioritization
Problem investigation and diagnosis
Creating a known error record
Problem resolution and closure
Major problem review

The problem management process

The ITIL problem management process has many steps, and each is vitally important to the success of the process and the quality of service delivered.

The first step is to detect the problem. A problem is raised either through escalation from the service desk, or through proactive evaluation of incident patterns and alerts from event management or continual service improvement processes. Signs of a problem include incidents that occur across the organization with similar conditions, incidents that repeat despite otherwise successful troubleshooting, and incidents that are unresolvable at the service desk.

The second step is to log the problem. In an ITIL framework, problems are logged in a problem record. A problem record is a compilation of every problem in an organization. This can be accomplished via a ticketing system that allows for problem ticket types. Pertinent problem data, such as the time and date of occurrence, the related incident(s), the symptoms, previous troubleshooting steps, and the problem category all help the problem management team research the root cause.

The third step is to categorize the problem. Problem categorization should match incident categorization. Incident [and problem] categorization involves assigning a main and secondary category to the issue. This step is beneficial in several ways. One benefit is that it allows the service desk to sort and model incidents that occur regularly. The modeling allows for automatic assignment of prioritization. The third and most important benefit is the ability to gather and report on service desk data. This data allows the organization to not only track problem trends, but also to assess its effect on service demand and service provider capacity.

The fourth step is to prioritize the problem. A problem’s priority is determined by its impact on users and on the business and its urgency. Urgency is how quickly the organization requires a resolution to the problem. The impact is a measure of the extent of potential damage the problem can cause the organization. Prioritizing the problem allows an organization to utilize investigative resources most effectively. It also allows organizations to mitigate damage to the service level agreement (SLA) by reallocating resources as soon as the issue is known.

The fifth step is a two-part process, which involves investigating and diagnosing the problem. The speed at which a problem is investigated and diagnosed depends on its assigned priority. High-priority issues should always be addressed first, as their impact on services is the greatest. Correct categorization helps here, since identifying trends is easier when problem categories correlate to incident categories. Diagnosis usually involves analyzing the incidents that lead to the problem report as well as further testing that may not be possible at the service desk level, such as advanced log analysis.

The sixth step is to identify a workaround for the problem. A workaround should always be indicated, because problems are not resolved at the incident level. A workaround enables the service desk to restore services to users while the problem is being resolved. A problem can take anywhere from an hour to months to resolve, therefore a workaround is vital. A problem is considered open until resolved, so a workaround should only be considered a temporary measure.

Step seven is to raise a known error record. Once the workaround has been identified, it should be communicated to staff within the organization as a known error. It’s good practice to record a known error in both an incident knowledge base and what ITIL calls a known error database (KEDB). Documenting the workaround allows the service desk to resolve incidents quickly and avoid further problems being raised on the same issue.

Step eight is to resolve the problem. Problems should be resolved whenever possible. Resolution resolves the underlying cause of a set of incidents and prevents those incidents from recurring. Some resolutions may require the change management board, as they may affect service levels. For example, a database switchover may cause slowness during the switchover period. All risks should be evaluated and accounted for before implementing the resolution. Document the steps taken to resolve the problem in the organization’s knowledge base.

The ninth step is to close the problem. This step should only occur after the problem has been raised, categorized, prioritized, identified, diagnosed, and resolved. While many organizations stop at this step, it isn't the last according to ITIL.

The final step is to review the problem. This is also known as a major problem review. The major problem review is an organizational activity that prevents future problems. During the review, the problem management team evaluates the problem documentation and identifies what happened and why. Lessons learned, such as process bottlenecks, what went wrong, and what helped should be discussed. This is where having a complete problem log will help. A completed log will work much better than trying to pull the details from memory. This problem review should result in improved processes, staff training, or more complete documentation.

I hope this will hepls you.

Thanks

Sanjay Bagri