It’s no news that today’s IT networks are a complex mixture of a multitude of technologies that have to work together to support business applications and processes. Networks are the pipes that transport the company’s lifeblood (i.e. data) that every user and back-end system uses to make decisions. Since single points of failure aren’t show-stopppers in well-designed networks, to stay ahead of issues, many enterprises have moved to a performance-oriented management paradigm.
Research shows that the number one cause of network issues is configuration related1 , whether planned for or not. Unauthorized changes to network devices can play havoc with even a well-run network. Hence, performance-related issues are a common occurrence. This means that it’s important to be able to associate configuration changes with performance issues when solving such problems. It’s just not practical to treat them as silos if SLAs are to be met. While Operations and Engineering staff often use disparate tools, a common toolset providing workflows that cross their silos of data, is extremely valuable. Eliminating the time trying to figure out whose screen is correct, boosts collaboration and speeds the analysis of what caused the problem.
Performance alone isn’t enough
In Figure 1, we can see typical performance data for a device’s interface in nice looking graphs… but, using this data solve issues is not as easy for most Operations staff. It also doesn’t correlate any data within the performance domain, or beyond, for example into interface or device faults, or configuration changes.
This sort of performance data presentation is best suited for ongoing status of the existing network, and planning for new or enhanced networks. And, it’s easy to get lost in this sea of data when you’re under pressure trying to solve problems. Unfortunately, this is commonly the most sophisticated information available to many network professionals.
Fortunately, a good alternative approach is possible with Network Operations Management’s Diagnostic Analytics system which visually combines monitoring of configuration changes with performance data. NOM’s Diagnostic Analytics creates a picture which is easy for all to understand, and provides more useful data than a hundred-page report.
In Figure 2, it’s clear to see that something is happening that looks related. What we need is the root-cause of an issue to focus efforts and speed triage. The process is like peeling an onion, a layer at time, as more details are uncovered. We stop peeling that specific onion when we eliminate it as a cause. Diagnostic Analysis does the peeling for us.
Diagnostic Analytics (DA) is a visual correlation view, overlapping performance data of your choice with device configuration change events. The best type of correlation for this is time-based. Here, two blue vertical lines indicate that configuration changes have occurred within a window at these times. DA can include dozens of performance metrics which have already been collected by NOM.
In this scenario, the first configuration change at 1:00 pm set in motion a spike in the device’s CPU utilization, which hits 100%, and may result in a down interface, or at best a performance brown-out (degradation) that users are likely to experience. The reason for this spike can be many, but clearly the device is routing a lot more traffic than it normally does.
The second configuration change appears to be a correction by the network administrator to remediate the problem. The performance then declines to a normal level. The detailed audit history of configuration changes that NOM keeps can confirm this, and that’s useful in the later steps in the triage process.
This valuable visualization speeds-up users’ understanding of the situation. If there is no visual association, users can move onto other interfaces, devices, or other data. At this time, eliminating causes is as important as including them.
What DA hasn’t shown you yet in this visualization is what exactly happened? As part of NOM’s workflow, Diagnostic Analytics provides a single-click, drill-down view into the specific configuration changes. In Figure 3 you see a side-by-side view of what was changed during one configuration event. This is a top-level view with further drill-down available into the specific configuration commands executed, and by whom.
In summary, new visual ways to solve age-old problems can be simple, clear, and powerful. It often just takes a fresh approach with the knowledge of how to present the data you already have at hand. NOM takes data correlations to a new level.
If you see potential for Diagnostic Analytics improving your network management processes, watch NOM Diagnostic Analytics in action on this informative YouTube video.
Also, checkout this webinar recording by EMA on NetSecOps.
1 – Enterprise Management Associates research report
Network Operations Management