In this tutorial, we will discuss the ITIL Problem Management Process. In this chapter, you will learn what is a problem in ITIL? and the Definition, Objective, Scope, Lifecycle, Activities, Roles, and Sub-Process of Problem Management - ITIL V3 Process. Moreover, we will also discuss the concept of ITIL Proactive Problem Management and Reactive Problem Management.
What is ITIL Problem Management Process?
ITIL Problem Management Process is responsible for managing the lifecycle of all problems that happen or could happen in an IT service.
Though the ITIL Problem Management process is closely related to managing incidents, it is a step beyond Incident Management. The Problem Management and Incident Management processes are so much similar in nature that in many organizations they are combined together and handled by the same team.
Although both the processes look very similar from outside, the true difference lies in their inner objectives. While the Incident Management works to restore the affected services to their normal state, the Problem Management (ITIL V3) works to find & resolve the root cause. For purpose of understanding, you may think the problem as a Disease and Incidents as the Symptoms of that Disease.
What is a Problem in ITIL?
Before going deep into the ITIL Problem Management Process, let us first understand that “what is a problem?”
In ITIL V3, the term “problem” refers to one or more related incidents for which root cause is yet to be identified. As officially defined by ITIL v3 documentation, a ‘problem’ is an underlying cause of one or more incidents.
Small incidents of consumable resources, such as the mouse or keyboard issues are not considered as a problem. But incidents like repeated network outages, repeated failure of server hardware/applications are considered as problems and investigated by Problem Management.
ITIL Problem Management Objective:
The primary objective of ITIL Problem Management Process is to prevent incidents from happening, and to minimize the impact of incidents that cannot be prevented.
Some other important objectives of this process are as follows:
- Find the root cause of any problem.
- Resolve all problems as fast as possible (at least according to agreed service levels) and monitor the effectiveness of the implemented solution.
- Proactively prevent the reoccurrence of incidents based upon underlying problems, taking into account data of Incident Management and problem suspicions.
- Maintain information about problems and the appropriate workarounds and resolutions.
ITIL Problem Management Purpose:
If an incident is occurring periodically and service desk is not able to provide a permanent solution, the issue is transferred to problem management. The purpose of transferring issues to Problem Management is to identify, troubleshoot, resolve, and document the root causes of repeated incidents.
As described in ITIL, problem management provides the service desk with the known error (KEDB Entry) and workaround information necessary to mitigate issues in the short term.
So, we can also realize that another important purpose of ITIL Problem management is to reduce the frequency of incidents over the long term. The reduction in total number of Incident reduce the load on the service desk, improves customer/user satisfaction, and decreases the long-term costs associated with downtime.
In case if any problem cannot be solved immediately, problem management works jointly the service desk to reduce the impact of the related incidents. The final goal of problem management process (ITIL V3) should always be to bring down the total number of preventable incidents and thereby increase the service quality.
The Scope of ITIL Problem Management Process:
Problem Management is having a very limited scope and activities within the purview of ITIL V3. To fulfill all its objective and purpose it continuously coordinates with other ITIL processes and functions. To identify the scope of ITIL problem management process we will now discuss some of the process interactions:
Service Desk: The most important function that interacts with the problem management. Because of the nature of work done by the Service Desk, they become the central point of contact for both the end-users and other ITIL processes. Hence, in process of providing resolution to any of the reported problems, the problem management team has to work side by side with the service desk.
Change Management: After finding the root cause of a problem, its necessary to fix the root cause so that the issue doesn't occur again. For this reason, sometimes it becomes necessary to make some changes to the service/component. Hence, it calls the Change Management Process to achieve this.
Release and Deployment Management: It is called by ITIL problem management process, in case the proposed change requires any new release to be developed and deployed.
Knowledge Management: The KEDB that is created by the problem management is managed and maintained by the knowledge management. Hence, a seamless communication channel has to be created and maintained between this two processes.
ITIL Proactive Problem Management and Reactive Problem Management:
As described by ITIL v3, this process can be divided into two types depending upon the nature of operation they have. They are (i) Reactive Problem Management and (ii) Proactive Problem Management.
(i) ITIL Reactive Problem Management:
This is the most common type of Problem Management we observe within the day-to-day operations. It is the means of finding the root cause of Incidents and solving the problem as quickly as possible. It works as an integral part of ITIL Service Operation.
At the time when incidents occur, incident management starts working on the incident as early as possible to resolve those incidents and restore service to usable levels. Eventually, during this process, some important indications and symptoms about root cause get lost.
So, in order to precisely identify the root cause, there should be a defined and agreed timeframe for the handover process of incidents from the Incident Management to Problem Management.
(ii) ITIL Proactive Problem Management:
It is associated with the activities of identifying and solving problems and known errors before further incidents related to them can reoccur. It often includes reviewing reports from other processes to identify patterns and trends of recurring incident symptoms that may point to any of the underlying problem factors.
Proactive problem management also identifies any training opportunities for IT staff, customers, and end users. At this point, it may also coordinate with Availability Management and Capacity Management for taking actions to prevent potential incidents from happening.
ITIL Problem Management Lifecycle Activities:
The ITIL Problem Management Process describes a ten-step process for managing problems. These are also called as Problem Management Life-cycle activities. Those activities or steps are listed below and usually followed in the sequential order:
(i) Problem Detection: This is the step where the problem is detected. The problem can be detected through any of the following practices:
- Detection or Suspicion of a cause of one or more incidents by the Service Desk
- Analysis of incident by technical support group to find incidents that are occurring repeatedly despite rigorous troubleshooting.
- Automated detection and reporting of infrastructure or application issues by Event Management tools.
- A notification from supplier informing an existing problem that has to be resolved.
(ii) Problem logging: In this step, problems are logged in a Problem Record. The problem record should contain the following information:
- User details
- Service details
- Equipment details
- Priority and categorization details
- Date/time initially logged
- Problem Summary
- Related Incident Tickets
(iii) Problem Categorization: Here problems are assigned to pre-defined categories according to the type, nature, attributes, SLA of the underlying incidents. But the assigned category should match the category of related incidents.
(iv) Problem Prioritization: In this step, priorities are assigned to problems. A problem’s priority is determined in the same way as incidents, by its impact on users & business and its urgency.
Problem prioritization should also consider the severity of the problems, taking into account that how serious the problem is in an infrastructure perspective (or service or customer perspective). Some questions to be asked in this context are:
- Can the system be recovered, or does it need to be replaced?
- How much will it cost?
- How many people, with what type of skills, will be needed to fix the problem?
- How long will it take to fix the problem?
- How extensive is the problem (e.g.. how many CIs are affected?)?
(v) Problem Investigation and Diagnosis: This step involves in the investigation and diagnostics of the reported problems. The speed of the investigation depends on the assigned category and priority. If the identified problem is related to any incidents, then this step analyzes those incident records to find the trend and any possible patterns to identify the root cause.
(vi) Identify Workaround: This step helps to restore the service by identifying any possible workaround. Because problems are usually critical in nature, the problems can take hours or even months to solve permanently.
Workaround helps the organization to restore services to the user even if the original problem is not resolved. The workaround should be considered only as a temporary solution until problem resolved.
(vii) Creating a Known Error Record: Once the workaround is identified, that should be marked as Known Error. It is essential to record a known error Known Error Database (KEDB) and uploaded to Service Knowledge Management System (SKMS).
Documenting the workaround allows the service desk to resolve incidents quickly and avoid further problems being raised on the same issue.
(viii) Problem Resolution: This step is responsible for providing the actual resolution of a problem. It is the means of resolving the underlying cause of a set of incidents and prevents those incidents from reoccurring. After the resolution is identified, the same should be documented in knowledgebase along with steps taken and problem details.
(ix) Problem Closure: It is the means of confirming the permanent resolution of a problem. It should also ensure that the problem record contains full historical detail of all events.
(x) Major Problem Review: In case of Major problems, this step is initiated after the problem closure step. It is an important activity to prevent future problems in future. Furthermore, it verifies whether the Problems marked as closed have actually been eliminated. It is used to review and document the following:
- Those things that were done correctly.
- Those things that were done wrong.
- What could be done better in future?
- How to prevent recurrence?
- Document lessons learned.
The below diagram shows the activities of ITIL Problem Management and also describes the interrelationship between them:
ITIL Problem Management Sub-Process:
As described by ITIL v3, this process is having Seven sub-processes. The Objective and descriptions about those sub-processes are given below, followed by a diagram illustrating the ITIL Problem Management Process flow. Please note that unlike activities, these sub-processes are usually NOT sequential:
1) Proactive Problem Identification:
Responsible for proactively identify Problems and provide suitable workaround before the actual incident occurs. It helps to improve the overall availability of services as well as improving customer satisfaction.
2) Problem Categorization and Prioritization:
Used to record, categorize and prioritize reported Problems. It helps to facilitate a swift and effective resolution.
3) Problem Diagnosis and Resolution:
Responsible for identifying the underlying root cause of a Problem and implement the most appropriate and economical solution to the Problem. This sub-process is also responsible for providing a temporary Workaround if possible. It is a vital sub-process of ITIL problem management, responsible for restoring user services within SLA.
4) Problem and Error Control:
Used to continually monitor outstanding Problems with regards to their processing status so that corrective measures can be introduced whenever required.
5) Problem Closure and Evaluation:
Responsible for ensuring that the Problem Record documents full historical description of the problem and that related Known Error Records are updated. This sub-process is only initiated after getting a successful resolution.
6) Major Problem Review:
In case of Major problems, this step is initiated after the problem closure step. It is an important activity to prevent future problems in future. Furthermore, it verifies whether the Problems marked as closed have actually been eliminated.
7) Problem Management Reporting:
Problem Management Reporting is responsible for communicating outstanding Problems, their processing-status, and existing Workarounds to other IT Service Management processes and as well as to IT Management.
The following image shows the process flow between sub-processes and their interrelationships:
Important Terminologies and Definitions:
- The document containing all details of a Problem, documenting the history of the Problem from detection to closure.
- Workarounds are temporary solutions provided to users, for reducing or eliminating the impact of Known Errors (and thus Problems) for which a full resolution is not yet available.
- Workarounds are often applied to reduce the impact of Incidents or Problems if their root causes cannot be readily identified or removed.
- Can be defined as a previously recorded problem for which now has a documented Root Cause & a Workaround.
- Known Errors are managed throughout their lifecycle by the Problem Management process.
- Usually, Known Errors are identified by Problem Management, but Known Errors may also be pointed out by other Service Management disciplines, e.g. Incident Management, or by suppliers.
Known Error Database (KEDB):
- It is a database consisting of previous knowledge of requests and known errors.
- It is created by Problem Management and used by Incident and Problem Management to manage all Known Error Records.
- Though KEDB is created by problem management, it is also a part of SKMS (See Also: What is SKMS?)
Problem Management Report:
- A document to report Problem-related information to the other Service Management processes.
ITIL Problem Management Roles and Responsibilities:
- This role is the Process Owner of this ITIL Problem Management Process.
- The Problem Manager is responsible for managing the lifecycle of all Problems.
- The primary objective of this problem manager role is to prevent recurrence of incidents and to minimize the impact of Incidents that cannot be prevented.
- This role is also responsible for maintaining information about Known Errors and Workarounds.
We hope that you have enjoyed the above article describing the Problem Management (ITIL V3) process. Be with us to explore free training on Leading Technologies and Certifications.
Leave us some comments if you have any question or doubts about ITIL Problem Management process, we will be happy to help you.