In this tutorial, we will discuss the ITIL Incident Management Process (ITIL IM). In this chapter, you will learn what is an incident in ITIL? What are Major Incident, Incident Management System (ICMS), Incident Model, and Incident Prioritization? - The Definition, Objective, Scope, Activities, Lifecycle, and Sub-Process of Incident Management - ITIL V3 Process.
What is ITIL Incident Management Process?
The ITIL Incident Management process is responsible for managing the life cycle of all incidents. It is the means of describing ITSM activities of identifying, analyzing, and Restoring IT services at their normal state as quickly as possible.
The Incident management process is not meant to do the root cause analysis to identify why an incident occurred. Instead, the focus of this process is on doing whatever is necessary to restore the service within SLA. To achieve this, it often makes use of temporary fixes or workarounds.
An important tool in the diagnosis of incidents is the known error database (KEDB), which is maintained by problem management. The KEDB contains all know errors and their previously identified workarounds.
What is an Incident in ITIL?
Before starting to learn about ITIL Incident Management (IM), we first need to understand what an incident is.
As defined in ITIL v3, an Incident is any unplanned disruption in IT service or any degradation in the quality of IT service or failure of any CI (even if it hasn’t affected the service yet) utilized to provide IT service.
What is a Major Incident in ITIL?
According to ITIL V3, it is the incident of Highest Priority. As described in ITIL, an incident needs to be considered as a major incident if it results in a significant impact on the business continuity and needs to be addressed on an immediate basis. It is generally associated with the highest rank of financial impacts on business.
For resolving this type of incidents separate procedures are taken by Major Incident Team to achieve the solution as quickly as possible.
What is Incident Management System (ICMS):
Incident management systems are the means if automating some iterative work of ITIL Incident Management Process. These are designed to collect time-sensitive & consistent data and to document them as an incident report.
Some of these ICMS products even have the ability to collect real-time incident information (such as time and date data), sending automated notifications, assign tasks and automatic escalations to appropriate levels etc.
Some modern products also provide the feature for administrators to configure & customize the Incident report forms, create analysis reports, and even setting up level based access control on the data.
ITIL Incident Management Objective:
The primary objective of ITIL Incident Management Process is to restore the IT service to its normal state as quickly as possible. It is used to manage the lifecycle of all Incidents (unplanned interruptions or reductions in quality of IT services or failure of components).
Some other objectives of ITIL Incident Management are as follows:
- Ensure that standardized methods and procedures are used for effective and prompt response, analysis, documentation, efficient management and reporting of incidents.
- Increase visibility and communication of incidents to business and IT support staff.
- Align incident management activities and priorities with the business strategy.
- Maintain user satisfaction by maintaining the quality of IT services.
ITIL Incident Management Scope:
By looking into the importance of managing incidents, some organizations maintain a dedicated team for incident management process only. In most companies, the task of incident management is relegated to the service desk and its owners, managers, and stakeholders.
Incident management has close relationships with many other ITSM processes and functions. Among them, Some of the closest processes and functions are as follows:
Change management: Sometimes resolution of an incident may require a change request to be raised. Moreover, it has been observed that a large percentage of incidents are caused by the implementation of changes. Hence, the "number of incidents caused by a change implementation" is a key performance indicator for change management.
Problem management: As described earlier, the ITIL Incident Management process is indeed very much dependent on the KEDB, which is maintained by the problem management. Thus, Problem management also depends on the accurate collection of incident data in order to execute its diagnostic responsibilities.
Service asset and configuration management: The configuration management system (CMS) is an important tool for incident resolution because it helps to identify the relationships between service components and also facilitates the integration of configuration data with the incident and problem data.
Service level management: The breach of a service level itself is an incident and acts as a trigger to invoke the service level management process. Moreover, incident management process takes input from the service level agreement (SLA) about the timescales and escalation procedures defined for different types of incidents.
Service Desk: The Service Desk function is the single point of contact for all the users to report incidents. Without the service desk, users will have to directly contact support staff irrespective of the incident priority. This is where the service desk come into the picture to filter, categorize those reported incidents, and providing 1st level support for every incident.
It is also important to remember that the ITIL Incident Management process also depends on the Technical Management and Application Management functions to get satisfactory resolution of numerous issues.
What are Incident Models in ITIL?
There are many incidents which repeat itself over a period of time and are not a new one. Incident Models are the best way to deal with that type of incidents.
Incident Models is the way to pre-define ‘standard’ practices or procedures, for handling specific types of incidents when they occur.
When defining Incident models an organization should include the below dimensions:
- Steps that should be taken to handle the incident.
- Define Responsibilities of every role, such as: “who should do what?”.
- Chronological order these steps should be taken in, with any dependencies or co-processing defined.
- Timescales and thresholds for completion of the actions.
- Detailing the escalation procedures which describes “who should be contacted and when?”.
- Any necessary evidence-preservation activities.
What is Incident Prioritization in ITIL?
Incident Prioritization is the process of separating incidents on the basis impact & urgency. This is done by simple statistical calculation and using the formula: PRIORITY=IMPACT x URGENCY. Based on this formula an organization can divide priority into several levels like High, Medium, Low etc.
High-Priority Incident - Affects a large number of users or customers, interrupt business, affect service delivery, and usually have financial impacts.
Medium-Priority Incident - Affects a few staff (or group) and interrupts work to some degree. Customers may be slightly affected or inconvenienced.
Low-Priority incident - Minor incidents that either has no impact or have little impact on the single user and have instant workarounds.
Below Image explains how the priority of an incident is calculated under the ITIL Incident Management Process:
What are the Incident Statuses Defined in ITIL?
When goes through the Incident Lifecycle, the status of that incident changes continuously. Below are those statuses and their short descriptions that are defined under the ITIL incident management best practice guidelines:
i) NEW: This status indicates that the service desk has received the incident but has not assigned it to any Service desk agent.
ii) Assigned: This status indicates that incident has been assigned to an individual service desk agent.
iii) In-Progress: Means that an incident has been assigned to an agent and he is actively working to diagnose and resolve the incident.
iv) On-Hold: Indicates that the incident requires some more information or response from the user or from a third party. In this state, the SLA counting is stopped.
v) Resolved: Means that the service desk has confirmed that a solution to the incident is provided and that the user’s service was restored to the SLA levels.
vi) Under Observation: This means a solution to an incident has been provided from service desk end, and they are still observing the effectiveness of the solution. This is usually done as per user request, on a case by case basis.
vii) Closed: This is the final status which means incident is completely resolved and that no further actions can be taken.
ITIL Incident Management Lifecycle Activities:
The ITIL framework describes a nine-step process for managing incidents. Also called incident management Lifecycle.
Those activities or steps are listed below and usually followed in the sequential order:
(i) Incident Identification: This step detects or reports the incident. It is done either by an input from event management tools or by any of the service desk channels.
(ii) Incident Logging: One incident is confirmed, the same is recorded in the Incident Management System (ICMS) by service desk, and thus incident is logged.
(iii) Incident Categorization: Here incidents are assigned to pre-defined categories according to their type, nature, attributes, SLA etc. For Example: it is categorized under Network issue, Server Issue, Infrastructure issue, Printer issue, Desktop Issue etc. This is a very important step to determine which Team of which Function would be handling the issue.
(iv) Incident Prioritization: In this step, the incident is prioritized for better utilization of the resources and the Support Staff time. Click Here to know more about Incident Prioritization.
(v) Incident Diagnosis: It is the means of revealing the full symptom of the incident by asking primary level troubleshooting questions to affected users.
(vi) Incident Escalation: This is done when the primary supports team is not able to solve the issue and they needs more advanced support. This often includes activities like sending an on-site technician or requiring assistance from Level2 or Level3 support teams etc. There are two types of Escalation- Functional escalation and Hierarchical escalation.
- Functional Escalation: Also Known as the Horizontal Escalation, it is the means of escalating an incident to a different team of the same level. Such as Level1 Application team escalates an incident to Level1 technology Team for further troubleshooting.
- Hierarchical Escalation: Also known as the Vertical Escalation. It is a means of escalating an incident to the higher level of the same or different function. Such as level2 support can escalate a critical problem to their team manager.
(vii) Investigation and diagnosis: This process takes place if no existing solution from the past could be found and the incident requires a deeper investigation. Here actions are taken to find the root cause of the issue. In case if the correction of the root cause is not possible for some reason then a Problem Record is created and the error-correction transferred to Problem Management.
(viii) Resolution and Recovery: Once the resolution of an incident is found and the same is implemented to restore the normal service. This is where the Service Desk confirms if the affected service is restored within the defined SLA Level.
(ix) Incident closure: This is where the incident is considered to be closed and the incident registry entry in the Incident Management System (ICMS) is closed by providing the end-status of the incident.
The below diagram shows the activities of ITIL Incident Management lifecycle and also describes the interrelationship between them:
ITIL Incident Management Sub-Process:
As described by ITIL v3, the Incident Management process has Nine Sub-Processes, these are usually NOT sequential. Below are the objectives and short descriptions about those sub-processes, followed by diagram illustrating the ITIL incident management process flow: :
1) Incident Management Support:
This has the overall responsibilities of providing and maintaining the tools, processes, skills, and rules for an effective and efficient handling of Incidents.
2) Incident Logging and Categorization:
Used to record, categorize and prioritize the reported Incident with utmost care, in order to facilitate a fast and effective resolution.
3) Pro-Active User Information:
A means of informing users about the service failures as soon as these are known to the Service Desk. It is done so that users can adjust themselves to a position for dealing with the interruptions. Proactive user information also helps to reduce the number of inquiries by users.
4) Immediate Incident Resolution by 1st Level Support:
This is the approach of quickly solving an Incident with a workaround provided by 1st level support (Role). If the 1st Level Support isn't able to resolve the Incident or when target times for 1st level resolution are exceeded, the Incident is transferred to a suitable group of 2nd Level Support.
5) Incident Resolution by 2nd Level Support:
2nd level support (Role) is there to work on more complex Incidents or to support the 1st level when they fail to provide resolution within a pre-defined time schedule. The objective of this stage is to recover the service or provide workaround within the SLA.
If required, they may also involve 3rd Level Support (Role) or specialist support groups for resolution. If the correction of the root cause is not possible, then they create a Problem Record and transfer that to Problem Management.
6) Handling of Major Incidents:
It is initiated especially for taking care of Major Incidents. The aim of this sub-process is the fast recovery of the service, where necessary by means of a Workaround.
7) Incident Monitoring and Escalation:
Used to continuously monitor the status of ongoing Incidents, so that actions can be taken as soon as possible if service levels are likely to be breached.
8) Incident Closure and Evaluation:
After getting a resolution Incident Records are submitted for a final quality control before it is closed. This sub-process aims to validate that the Incident is actually resolved and that all information regarding the Incident's life-cycle are recorded for future use.
9) Incident Management Reporting:
Incident Management Reporting is responsible for communicating Incident-related information to other Service Management processes and stakeholders.
The following image illustrates the ITIL incident management process flow and how the sub-processes interacts:
Key Elements for The Success of Incident Management:
In order for successful execution of incident management process, it requires several key elements to be present in the system:
- A well-defined service level agreement between the service provider and the customer is required. The agreement should clearly define incident priorities, escalation matrix, response/resolution time frames etc.
- Defined Incident models and templates, that allow incidents to be resolved efficiently
- Criteria for the categorization of incident types for better data collection and problem management
- Agreement on incident statuses, priorities, categories
- Establishment of a major incident response process operating under the ITIL Incident Management Process.
- Understanding of the incident management roles and responsibilities
Important Terminologies and Definitions:
Incident Escalation Rules:
- A set of rules defining a hierarchy and triggers for escalating Incidents. Triggers are usually based on Incident severity and resolution times.
- A report for providing the Incident-related information to the other Service Management processes.
- A set of data containing details of an Incident, documenting the entire history of the Incident from incident logging till closure.
Incident Status Information:
- A message describing the present status of an Incident sent to a user who reported the disruption of service.
- Status information is typically provided to users at various points during an Incident's lifecycle. Click Here to learn more about Incident Status.
Notification of Service Failure:
- The reporting of a service failure to the Service Desk, for example by a user via telephone or e-mail, or by a system monitoring tool.
- An inquiry regarding the present status of an Incident or Service Request.
- It is usually raised by a user who reported an Incident or submitted a request.
- A request to support the resolution of an Incident or Problem, usually issued from the Incident or Problem Management processes when further assistance is needed from technical experts.
- Self-help information made available to users by the Service Desk, usually as part of the Support Pages on the intranet.
ITIL Incident Management Roles and Responsibilities:
- The Incident Manager role is the Process Owner of ITIL Incident Management Process.
- The Incident Manager is responsible for the effective implementation of the Incident Management process and carries out the corresponding reporting.
- He is also responsible for ensuring that the Incidents are resolved within the agreed SLA targets.
1st Level Support:
- The responsibility of 1st-Level Support is to register and classify received Incidents and to undertake immediate actions for restoring failed IT services as quickly as possible.
- In case they not able to provide a solution, then they need to transfer the Incident to expert technical support groups (2nd Level Support).
- 1st Level Support also has the responsibility for keeping the users informed about their Incidents' status at agreed intervals.
2nd Level Support:
- 2nd Level Support takes over Incidents which cannot be solved immediately by the 1st Level Support.
- The 2nd level support generally consists of specialist IT support staffs.
- If necessary, they can also request external support, e.g. from software or hardware manufacturers.
- The aim of this support level is to restore a failed IT service as quickly as possible.
- If no solution can be found, then 2nd Level Support can either involve 3rd level support or transfer the incident to Problem Management.
3rd Level Support:
- 3rd Level Support is typically located at hardware or software manufacturers (third-party suppliers).
- Its services are requested by 2nd Level Support for solving complex incidents.
Major Incident Team:
- A team of IT managers and technical experts put together to dedicatedly work for the resolution of a Major Incidents.
- It is generally dynamically established and works under the purview of the Incident Manager.
We hope that you have enjoyed the above article describing the ITIL Incident Management Process. Be with us to explore free training on Leading Technologies and Certifications.
Leave us some comments if you have any questions or doubts about Incident Management Lifecycle, Incident Models, and Incident Management System (ICMS), we will be happy to help you.