Enrich your (Incident) Life!

Many organisations see many different advantages to IT Process Automation, and much is said about the benefits of automating routine, mundane and manual tasks from overburdened operations personnel. However one topic that seems to resonate consistently with organisations following an ITIL-based approach is that of ‘Incident Enrichment’. To explore why this might be the case, sit back, relax, and let me tell a story…

“A couple of years ago, ABC Corp threw themselves headlong into implementing ITIL, as they saw it as the ‘Magic Pill’ to cure all their IT operations woes. To support this initiative, they made an investment in the latest ServiceDesk technology from ‘Big Joes ServiceDesk Solutions’. And they loved it. They could finally see how many incidents were being raised and who was handling them, and could report to management on ‘Mean Time To Recovery' (MTTR), ‘Mean Time Between System Incidents’ and lots of other metrics they had learned about on their ITIL training. Even many of the guys at the coalface liked it; they could finally prove how hard they were working (and the IT department even ran a few friendly competitions in the early days, awarding Dons Delicious Donuts vouchers to the guys that closed the most incidents). And, they could only wonder at all the other ITIL processes that they could implement later.

But, the honeymoon period soon ended. ABC Corp wanted to introduce Problem Management – they knew that the best IT departments were all into ‘Proactive Problem Management’. Mary was appointed ABC’s ‘Problem Manager’. You had to hand it to her, she was keen. “Let’s review our Incidents for the last 6 months, see what we can identify as Known Errors”. She pored over the data, but it was decidedly lacking in information. “I can’t believe this” said Mary. “Look at all these incidents that have been closed without information. The admins have just commented ‘problem fixed’!” She needed to know why, so she headed off in search of ABC’s administrator extraordinaire – “Mike the MCSE”. Mike was quick to point out that since all the budget cuts, there were fewer staff, and updating tickets had just become a layer of bureaucracy that they didn’t need, and frankly didn’t feel they had time to deal with. “After all” Mike said, “I can resolve a customer problem in 2 minutes, yet it can take me 10 more to update the ticket with all the steps I took”. Mary knew she needed to find a solution…”

In this story, Mary could work with ABC Corp’s IT Management to enforce that incidents get updated more thoroughly, but they will likely hit resistance, and they may find that they only achieve a temporary result. It’s much better then to find a solution that actually helps the guys on the ground be more efficient and effective, resulting in improved service delivery. Many organisations try to achieve this exchange of data between technologies by implementing initiatives like auto-ticketing from monitoring tools, so the incident contains the exception information contained in an event or alert. Whilst this is a step in right direction, it presents challenges. After all, you don’t want to auto-ticket on all events, as you’d just overwhelm your Service Desk. So, how do you choose which events or alerts should be auto-ticketed, and how do you deal with the threat of an event storm? And, this doesn't solve the problem of Admins like 'Mike The MCSE' not updating incidents later in their lifecycle.

Ultimately, any initiative like this is trying support the Incident Management goal of restoring service as quickly as possible. To achieve this, the process of Incident Management has a number of steps, typically to Identify, Record, Categorise, Prioritise, Diagnose, Resolve & Recover and Close the incident. And it is the lifecycle of these activities, or the time taken to move from the Identify step to the Resolve & Recover step that gives us the useful metrics like MTTR etc. Common sense dictates therefore that if we can give the people/technology tasked with making some of these decisions better information to enable them to make a faster, better informed decision, we’re reducing our MTTR, meeting service levels, and potentially improving the perception of IT. This is where ITPA can help. By enriching incidents with timely pertinent data – perhaps End User Response Time data to determine ‘who’ is actually affected, or further diagnostic information, we can enable these decisions to be taken quickly. Next, enriched incidents provide a valuable input into Problem Management, where we can use them to identify Known Errors. These Known Errors could then be automated further, where ITPA orchestrates the steps of the Incident Management process from Record to Recovery, without human intervention if appropriate.