Guest post by Sergio Banguero - ITOM toolset and Automation Engineer - Schlumberger
I am part of an IT organization in an oilfield services company. Schlumberger is the world’s leader in oil field services. Due to the recent headwinds that hit the energy industry, my team has faced staff reduction and lower budgets for IT projects.
My team provides internal IT support for the whole IT operations. We call our program Automation Factory, because that’s what we are trying to do. We are trying to discover opportunities within the company, and then produce automation resources that will help the organization improve its efficiency.
Do more with less
Our IT was challenged, or let’s say, put in a position to help try to improve the situation. Our team faced lower budgets, lower staff, and we needed to continue doing our current tasks and more.
Discovering the manual processes
When we started analyzing the situation from an IT perspective in 2015—and especially from the support processes that the different IT people follow—we noticed that there are a lot of tasks that support people follow that are repetitive, are rule-based and are easily definable. The other half of the processes are related to the human value that people provide such as:
- The ability to communicate well
- The ability to come to an agreement
- Higher cognitive abilities
Understanding the gaps
To understand where to fill the gaps we developed the Service Automation Maturity model below.
The Automation Maturity Scale is basically a scale that defines the maturity in the automation domain for any process. If it is a process that people are doing manually, it’s a manual level, which is level zero. If it’s a process that is managed through a script, then it’s a scripted level, and so on and so forth it goes up to the intelligent level where the task is managed or handled by intelligent types of technologies such as Artificial Intelligence.
No artificial intelligence yet
We started by mapping the different IT processes onto this scale, and defining our goal. As a result, we knew that we didn’t have the technology. The desired technology is not really out there in the market yet to define a problem and take it to Artificial Intelligence quickly or easily or cheaply. So we wanted to do what we could using Micro Focus Operations Bridge (OpsBridge) to create a type of automation based on Bots.
Finding the ROI
We have a team that is based in Kuala Lumpur, it is the analog of a network operation center (NOC). This team operates all of the network and infrastructure events for the organization—which is a lot. This group consists of more than 50 people and the team has its own structure and specialties and technology. We found that the team is doing a lot of repetitive events where they are following a repetitive process to try to fix or remediate. We found a good opportunity to help remediate the repetition of these events. We tried to discover the different processes they were running and the number of times that they were using on them. By doing this kind of ROI analysis, we found that there were a lot of opportunities.
Not everything matches the documentation
We deployed one resource from our team in Kuala Lumpur that went inside the NOC team and that person could connect with them and really understand the documentation they had on the different support processes. The NOC team had a lot of documentation, and as you might imagine, the documentation really didn’t match what they were doing in real life.
For example: when they received an event of a network device down, their documentation had a workflow, a series of steps but the analyst was doing something different. The activity was not totally different but they were following some different steps.
When we had the classification of the task, we needed to have an approach on how to convert this; let's say process mapping into automation. We already had the tool with Operations Orchestration. We know that it is capable of executing runbooks, but we wanted to create something that was easy to maintain, and that has the opportunity to be enhanced. We wanted the flexibility for us to keep adding multiple processes.
Look for road blocks
We hit a lot of road blocks during these initial parts of the development, because we discovered what we didn't know. Our organization is global and many of the teams that are involved in support activities or that are owners of the different systems are in different parts of the world. We found that we would start an automation and then would discover we needed access to a specific system. Often the person that could grant that access is on the other side of the world and is sleeping right now. This process delayed our progress to complete the automation because we didn't do the due diligence of identifying what we later call the technical dependencies.
Through trial and error we came up with a process that consists of smaller steps that allows us to do the definition of the automation on paper before we start doing the automation itself on the Operations Orchestration tool. Initially we perform the discovery and we do the classification from the ROI. Then we do an evaluation of what we have discovered, and identify if it is possible to automate.
If the answer is yes, then we meet with all decision makers. These members include the flow developer, the person doing the activity; the network guy, who is engineering the solution; and everyone with a vested interest in the project will be in the meeting. At the meeting, the team will discuss in detail the automation opportunity, and questions will be asked, such as, if we need to restart the router, how do you login to the router?
If the automation flow needs to create a ticket, we need to understand the technical way to create a ticket. We also need to understand a variety of other topics including:
- What is the web service that we will need to call?
- What are the user accounts?
- What are the right permissions?
- How do we fill out the templates for the ticket creation?
At the end, these activities will be covered in paper, before the actual development, as part of the technical dependencies resolution.
Then, if there was a big issue we will cancel the automation or review the scope.
You don’t have to do it all to get value
This is where another relevant point from our methodology came up: we will automate as much as we can. If the scope is reduced and we can automate only 10 percent of the activity instead of 100 percent, we will still do the effort to automate at 10 percent and then by iteration try to add the other part later. The benefit from this is that we still can show a little bit of value and people can start understanding the concept of automation into their activity.
Wanted an expert flow developer
We don’t have a flow developer in our organization. So, we hired an external consultant from an Micro Focus Partner in France, who was our expert automation architect and flow developer. He helped decide the direction to take. He really helped us view the concept of these auto-triggered Bots.
We scheduled certain dates during the month to meet with him in what we call the Automation Day. This was basically the day that we dedicated to building the automation opportunities that were completed with the technical feasibility side and the technical resolution, dependency resolution and so on.
Once all of the paperwork was done, we had collected a lot of information. Then we took that information create the specific steps to take in the development and we build it with him.
During this meeting, we found that if maybe we didn’t understand something and we had an issue, it is beneficial to have access to one of the experts. As a requirement, we have the expert be present or at least reachable through a phone call.
Then we started developing more and more remediation flows for typical support activities that the NOC team was doing. By doing this, I think we created more than 10 remediation flows that a single Bot is running automatically.
We do tests, but they are not something we do when we are developing. That is because it’s a lot of effort, but we distribute it between everyone who is part of the automation effort.
Now that we have the correct automation or runbooks implemented in OO, we were at level 3, Orchestrated Automation.
Defining the core structure for auto-trigger
We decided to create a core structure for the automation that is an analog to an industrial robot arm. In the automotive industry, robot arms are programed for one functionality—moving an object from one table to the other. Now you can start putting more logic into the robot like moving an object depending on the object’s weight.
We wanted to do remediation based on different types of events. So we created a launcher, a Flow Launcher or Bot that is able to identify the type of event and then it assigns the remediation flow that has been already defined. It's based on the runbook and the workflow that we mapped from the support process.
OMi makes a good thing even better
The Bot can automatically respond to a trigger, and that’s why we call this maturity level auto-triggered. We wanted to be able to connect that auto-trigger ability from the Bot to run Operations Orchestration and whatever remediation runbook the team already had developed. From that point, we started building a clear methodology based on our experience.
Event management is the management of different IT operations events like a network device goes down or a server has a disk space that is getting full. There are a lot of opportunities to automate these events. We used OMi which already has some out-of-the-box functionality. The first one that we used is called Advanced Correlation, where it tags events as symptoms or causes. This capability allows users to focus on the causes and not waste time on the symptoms.
The Bot watches for an event coming in from OMi and automatically triggers the remediation; we can also enrich the event and the ticket with additional information using OMi Event Annotations. The result is that, at best, we will remove this activity— this repetitive and rules-based activity—from the operator side, and at worst, we will add helpful technical details such as logs for the operator. The operator in this case is the Network Operations Center analyst, who will have otherwise received that interaction.
Initially there was a lot of fear of automation because we are taking activities that they are doing and we are delegating them to what we call a virtual workforce. But the intention of the management was not to reduce the staff; it was to optimize the resources.
We ended up moving some people from the level 1 support team to the level 2 support team because they were needed to cover more high value activities rather than repetitive activities.
Now the NOC team works side-by-side and we can continuously improve the type of activities following the remediation and maybe add enhancements that we didn’t have otherwise identified.
To get more information on this release and how customers are using Operations Bridge we are happy to announce the following events you can register for
ITOM Customer Forum, Brussels – December 2017 - coming soon
Read all our news at the OpsBridge blog
Explore all the capabilities of the Operations Bridge Suite and technology integrations by visiting these sites:
- Operations Bridge Suite
- Operations Manager i
- Operations Bridge Technology Integrations eBook
- Operations Bridge Integrations page