3 min read time

IT Is Hitting the Complexity Wall. AIOps and Observability Can Help.

by   in IT Operations Cloud

In your quest for competitive advantage with cloud applications, have you hit the complexity wall?  

The complexity wall

Our move to hybrid and multicloud has increased complexity beyond what IT can manage, according to David Linthicum, a leading technology innovator and influencer. In his recent paper, Observability Trends and Pragmatic Techniques to Optimize Multicloud Operations, David points out that the push to leverage best-of-breed technology becomes a clear business value that has a huge downside—a level of complexity that can’t be avoided.

Figure 1. Complexity rises as we add more computer systems, cloud or not, which adds to the cost of operations, security, and governance. All deployments will reach a tipping point where more systems generate more complexity than can be cost justified.

Many organizations have already hit that complexity wall, where the amount of complexity and heterogeneity extends beyond IT’s ability to effectively manage the new levels of complexity. The resulting impact is that many of those enterprises will experience negative value.

Figure 2. Negative value resulting from complexity.

Conquering complexity with observability

As with many problems, there are tools that can help. In this case, the tools are AIOps and observability.

Figure 3. Discovery, monitoring, observability, and AIOps.

Discovery and monitoring are well understood in the IT space. Observability and AIOps definitions are still fluid. For this discussion, observability takes monitoring data from all areas of the IT estate, analyzes it, and shows you the problem your service is having. Dashboards and reports help you visualize the problem. But perhaps more important is the analysis to pinpoint the problem.

Suppose you have an IT landscape with 10K items, a combination of compute (servers, VMs, containers), storage, and network devices. In addition, you have applications and services that run on the infrastructure, say 1,000 applications. If you monitor each of those at 1-minute intervals, you receive 11K data points per minute. Log files are important sources of status information. Conservatively, let’s say 1 log file entry per minute per item. That’s an additional 11K data points for a total of 22K every minute.

A user calls and says they are having a problem with an application. I doubt any ITOps team could look at 22K entries to see the problem in a minute, before getting the next set of data.

The visualization in Figure 4 makes it much easier to see where to start solving the problem. We have an application at the top; all its components are shown with their respective status.



Figure 4. Topology or dependency map showing which devices and services are part of the application.

AIOps—beyond observability

AIOps adds another dimension of help—by fixing the problem with automation. One of our customers fixes 95% of known problems with no human intervention and saves about $4M annually. Even those unwilling to take a hands-off approach can benefit greatly. Automating runbook solutions to be run manually saves typing time and, more importantly, typing errors. Many major outages are due to the wrong command being typed. Further, with tested runbooks, the power to solve the problem can move from expensive SMEs to lower levels.

Observability Trends Optimize Multicloud Operations Linthicum

Circling back, we can now answer these questions: “What is observability?” and “Why do I need it?” 

Observability is the capability of a tool to give you the information to see what’s happening inside complex systems. You need it because the complexity and volume of data are too high to handle manually. As a bonus, AIOps builds on observability and provides automation to help fix the problem once it is found.

Observability means different things depending on who you talk to. We must move from abstract concepts to ones that create specific objectives for IT operations, including cloud and legacy systems.

If you're operating in a multicloud IT estate, you can tackle complexity with pragmatic techniques and an observability maturity model. Read this Linthicum Research white paper to learn how to rein in complexity and return more value to the business.

Get the latest at the Operations Bridge blog. 

Related items 


Have technical questions about Operations Bridge? Visit the OpsBridge User Discussion Forum. 

Keep up with the latest Tips & Info about Operations Bridge. 

Do you have an idea or Product Enhancement Request about Operations Bridge? Submit it in the Ops Bridge Idea Exchange. 

We’d love to hear your thoughts on this blog. Comment below. 


Operations Bridge