The Zurich release has arrived! Interested in new features and functionalities? Click here for more

Chris Shakespea
ServiceNow Employee
ServiceNow Employee

 

Problem Management and making it successful

 

Why is Problem Management important?

The goal of zero-incidents to improve the resiliency and efficiency of enterprise IT is one shared by many. This becomes harder to achieve as enterprises adopt new technologies that are faster, more scalable with a shorter time to market as they strive to make their IT operations more agile and complex. Problem Management can play a key role in helping to draw closer to this ambitious goal.

Many organizations struggle to demonstrate value from Problem Management and therefore achieve the benefits from it. Successful Problem Management requires a level of both maturity in its dependent processes and organizational maturity.

Common outcomes from implementing Problem Management :-

  • Improved quality of customer service due to less downtime and fewer disruptions
  • Reduced impact of impacts through more timely resolution
  • Improved user satisfaction
  • Reduced number of incidents to manage
  • Improved ability to meet Service Level commitments
  • Reduced agent frustration from handling recurring incidents

 

Summary of Recommendations

There are 10 steps we have identified to getting started with Problem Management.

 

  1. Build a business case
  2. Gain Executive Sponsorship
  3. Have a plan
  4. Build a team
  5. Make progress visible
  6. Establish a Problem Management process
  7. Conduct an Incident Management process review
  8. Problem Management knowledge repository
  9. Manage the starting scope, be realistic
  10. Measuring success – Problem Management Metrics

 

 

Recommendation Detail

 

Build a business case

Calculate potential ROI based on past repeat incidents or major outages that impacted the business.

State the objectives of the problem management process in business terms and link, where possible, to key company objectives

e.g. reducing incidents, speeding up service to customers, reducing downtime, saving on support costs, etc.

 

Gain Executive Sponsorship

Gaining organizational commitment and sponsorship is key. The Executive Sponsor can ensure that the key stakeholders understand why the process is being implemented. Therefore, a committed and engaged Executive Sponsor will assist greatly in the success of the project, especially where the project has organizational impact.

 

Have a plan

The implementation of Problem Management has to be managed as a project i.e.

  • Determine Project Goals and relationship to key company objectives
  • Assign a Project Lead
  • Develop a plan and staffing requirements
  • Organizational Change Requirements
  • etc.

This and appropriate governance will ensure that as the project progresses a clear understanding within the business of the project and its progress can be achieved.

 

 

Build a team

Having the right people in place is a fundamental requirement for Problem Management and is highly dependent on the team members having good skills in Problem Solving, Analysis and Communication. They need to be technically aware but don’t necessarily need to be technical specialists as they will often be calling upon and coordinating Subject Matter Experts as a part of a virtual Problem Management team. Dependent on the problem they may have to adapt their approach to address it. Critically the problem management role is to act as a catalyst to resolution and not the resolver.

Appoint a strong Problem Manager. The Problem Manager is a key role in the success of the implementation.  Be clear on their responsibilities, it is not an administrative role and different to the skills and competencies that makes a good Service Desk Manager.

 

 

Make progress visible

Report on progress to the Executive Sponsor and team, ideally as part of a management scorecard, and to the wider business. Consider socializing a top list of problems and highlight improvement opportunities for resolution. This will help sell the benefits of Problem Management.

 

Establish a Problem Management Process

ServiceNow provides a Problem Management process that ties directly into Incident Management and can consume data and capabilities from other processes e.g. relating Incidents to Problems, Performance Analytics for reporting, Continual Improvement Management for rectification projects etc.

Ensure that the process is documented and understood by everyone involved in it with appropriate process metrics. See Problem Management Process Guide for further guidance.

Remember though that processes alone are not enough and it is critical that strong people skills support it so having the responsibilities of the key players in a RACI (responsible, accountable, consulted, informed) chart can help facilitate this. It is important that the Problem Management team is separate from the Incident Management activities. Although they are strongly related the key objectives are different.

 

 

Conduct an Incident Management Process Review

If you have been running your incident management process for some time or newly implemented its important in both cases to ensure that the tie in with Problem Management is understood.

When Incidents are logged how are key parameters such as the following used

  • Categorizations
  • Service / Service Offerings
  • Assignment Groups

 

Ensure that these are providing both an effective means to log incidents and provide Problem Management the required level of information needed for trend analysis.

 

Value Analysis

It’s important that Problem Management is not seen as the place where old incidents go to die but a real valued part of the operational processes. To execute the Problem Management process can take considerable effort therefore a return on that investment must be provided and in this case through incident reduction.

Let’s look at some simple analysis:

 

ChrisShakespea_0-1668444839897.png

 

The FTE saving can be used to either provide a higher capacity on the desk or, and potentially more productive, contribute to the problem management process and drive further savings. In this way there is a level of self-funding from implementing the process.

 

Depending on the customer these could be conservative figures, one set from a customer

"every customer ticket we get, creates 3 callbacks..at..$11 per call and $4 per chat adds up to an overwhelming spend for us"

 

As well as these direct ROI there are also indirect ROI achievable, examples of these :-

  • Improved service quality leading to higher employee CSAT.
  • Reduced risk of incidents impacting the business.

 

Measure both the direct ROI and indirect ROI items to show the total outcome achieved.

 

A key linkage is to capture improvement activities through Continual Improvement Management. This helps ensure that the activities are accurately recorded and their impact understood.

 

To establish the saving opportunities there are analysis techniques that can be used to assess incident data

  • Manual Analysis
  • Machine Learning Analysis
  • Event Driven Problem Creation

 

Manual Analysis

Periodic trend analysis of incident records will help in the identification and elimination of the source of reoccurring incidents. Performance Analytics ITSM dashboards  can help with this analysis. 

 

Incident Categorization

Incident categorization can be done through a number of means e.g. categories, services, CI’s. All these methods are valid and depend on the objectives of the organization. They also provide a means of structuring problem analysis.

e.g. Produce a simple pivot table with category and sub-sub-category as one axis and date as the other over say a 6 month time period.  Inspection of this data would then allow further analysis of the causes. It’s important not to just attribute isolated peaks as anomalies e.g. was the problem management process followed at the time. Can the root cause be correctly attributed e.g. a public holiday causing high demand and causing a system outage will reoccur and measures should be taken.

Other data within incidents can be useful such as affected user groups and locations.

 

Major Incident Analysis

As part of the process a root cause analysis should always be conducted to understand what caused the incident and prevent it from happening again. Problem Management provides a means to capture, prioritize and manage the work.

 

Anecdotal

Asking the domain experts simply “What can we do to reduce the number of incidents that come to your team?” can lead to insights.  Looking for “frequent flyers” in terms of incidents and doing that on a regular basis with the service desk helps ensure that these are being recorded and addressed.  Supporting that discussion with data driven analysis can deliver a more complete insight into what the domain experts are seeing in the field. Doing this regularly will have shift the culture to one where Problem Management is one of the things that just gets done.

 

 

 

Machine learning analysis

Machine learning (ML) provides the ability to analyze large numbers of records in an effort to look for patterns and similarities.  It can also be used to present information to agents that is contextually relevant to the incident or problem they are working on.

 

Clustering

The use of ML learning to drive clustering analysis, which groups similar records together, can significantly reduce the effort for finding patterns in incident data. Once clusters of similar incidents are created the problem manager can perform further analysis to identify critical problems from incident information prioritize the highest value problems to investigate.

Clustering can be used to help with proactive problem analysis.[1].

One approach is to cluster around resolution notes and resolution (close) codes on incidents.

 

Similarity

Predictive Intelligence similarity framework identifies existing records that have similar values to a new record. In this case the similar records are often displayed to the agent through their workspace, speeding up the association of records through the similarity recommendations.

The Predictive Intelligence for Incident plugin[2] , and in addition Predictive Intelligence for Major Incident Management, provides solution definitions as templates on instances where both Predictive Intelligence and Incident Management are active.

Similar Open Problems - Recommends similar open problems that the current incident can be linked to.

The Similarity Models needs to be trained and these models can then be used to predict Open Problems using the ML Similarity and the predicted results will be displayed in contextual search/Agent assist in their respective UI's.[3]

For Similar Open Problems, the approach used to display the results is the combination of below definition (in priority order)

  1. Problem records with similar short description as to the Incident's short description.
  2. All Problem records which are associated with similar (based on short description) Open Incidents.

 

 

[1] This requires ml_admin_role, which is broad, to view the solution visualization

[2] Ref. plugin activation notes: https://docs.servicenow.com/?context=CSHelp:PI-Incident-Plugin&version=latest

[3] These solution definitions are trained with past 6 months of data and has unique way of showing the results respectively. The result records whose confidence is greater than or equals 80% will be displayed.

 

The retrieved Problem record can be linked to the Incident record.

  • If the search type is Similar Open Problems, the Problem field on the incident form will be populated with the corresponding problem number.
    • In case of Agent assist, A model will be displayed with appropriate message for linking the problem record to incident.
    • A Success message is displayed when the action is successful on both UI's.

 

Event Driven Problem Creation

The creation of incidents through event management is a common use case.  This can be used to actively create problem records

e.g. if same event have triggered 30 times within a given period, then create a problem record with specific title e.g. Proactive Problem ....

It is also possible from the Clustering framework API to create a scheduled script that creates a problem based on a specified criteria then associates the incidents to that problem.

 

 

Problem Management Knowledge Repository

An important step in Problem Management is the documentation and communication of known error records for reoccurring issues and their workarounds to principally the Incident Management team.

Once the root cause and permanent resolution has been identified then, where appropriate, work should be done with change management to ensure that the change has been raised though not all problems can justifiably be fixed. The assessment and root cause analysis steps are critical steps in understanding the risk to the business and the value in resolving. Have Problem Management as an integral part of a wider CSI (Continual Service Improvement) culture within the organization.

 

 

Manage the starting scope, be realistic

Keep the scope simple to get started and accept that it won’t be perfect initially. The performance of the process will improve overtime therefore incident volumes many continue to rise (not impacted) initially in spite of the resources put into Problem Management.

Begin with some analysis as suggested in Manual Analysis section looking for quick wins i.e. identify one or two high business impact problems that have high visibility. Use these within the management reporting to demonstrate value and success.

Ultimately the combined resource requirements for Incident and Problem Management will drop below the old Incident Management requirements.

 

 

Measuring Success - Problem Management Metrics

With problem management the purpose is to understand the underlying cause of issues and permanently fix them no matter how long that takes or if it is even possible.  Therefore, in problem management speed of resolution is not something that should be measured.  This would drive the wrong behavior for the process and focus on closing records rather than finding the permanent fix.  Process Owners need to feel comfortable with problem records potentially remaining open for months or even years.

 

Process KPIs

  • Provide information on the effectiveness of the process and the impact of continuous improvement efforts
  • Are best represented as trend lines and tracked over time
  • Monitored by the Process Owner 

 

ChrisShakespea_1-1668445164817.png

 

Operational Data

  •           Provides information on active problems giving visibility and oversight.
  •           Best tracked on a dashboard or homepage.
  •           Monitored by the Service Desk Teams

 

ChrisShakespea_2-1668445258089.png

 

 

Comments
adejong
Tera Contributor

I was wondering how one gets the "Risk Accepted problems with new incidents " KPI/Report in ServiceNow?

AlexB64
Tera Contributor

What an absolutely Top article you've written here Chris. I've worked in in Service Management for over 20 years and have seldom seen Problem Management articulated in such a comprehensive and intuitive way.

vinodkumar2010
Tera Contributor

Nice content

Version history
Last update:
‎02-18-2025 07:26 AM
Updated by: