6 min read time

Operations Bridge AIOps: Latest Changes in Automatic Event Correlation

by   in IT Operations Cloud

Automatic Event Correlation (AEC), a part of OpenTextTM Operations Bridge AIOps platform, is the analytic capability on top of the OPTIC Data Lake that offers automatic correlation of events using a machine-learning algorithm. AEC works by analyzing patterns in the event stream and using these patterns to group events together, which, with a high probability, originate from the same problem and to determine the probable root cause. The group of related events is transformed into a single (correlated) event in OBM. This grouping of events facilitates event processing for an operator. It shows all related events together (grouped by AEC), making it easier to identify and work on the root cause. Closing the root cause event automatically closes all associated events.

Over the course of the past few releases, we have worked on this functionality making it more intuitive and introducing more and more useful features. We added a UI that explains how AEC does its magic, the AEC Explained UI. We have also introduced and improved the identification of the probable root cause, as well as numerous tuning possibilities.

Just recently, in our Operations Bridge CE 23.4 release, we added the AEC Preview mode and Tuning UI:

  • AEC Preview mode: In a new deployment, AEC will be in preview mode at the beginning, which allows you to explore how AEC is working using the AEC Explained UI and to finetune AEC before sending correlation events to OBM. This allows you to confirm its operation meets your needs before exposing the correlations to your operators.
  • AEC Tuning UI: You can finetune AEC for better event reduction by influencing the root cause algorithm, so that it can better identify the proper root cause. You can also instruct AEC to exclude correlation of certain events altogether.

The Preview mode and the AEC Explained UI

For new deployments, AEC will be in preview mode automatically. You can start using AEC and then use the AEC Explained UI to see what it did. AEC will work out of the box using default settings, but it can also be tuned according to your needs if needed. Please note that AEC should be running for several days or weeks to detect patterns in your event stream. So deploy it, let it run and do its magic, and come back after 2 weeks to check in the AEC Explained UI what it detected.

The following UI pages are available in the AEC Explained UI:

  • AEC Overview with the general information, such as:
    • A chart displaying the event count over time (the overall number of events in your system and the events correlated by AEC). You can specify the timeframe you are interested in with the time range selector.
    • General information on existing topology partitions, correlation groups and the events correlated by AEC, with the option to drill down for the detailed analysis.
  • Drilldowns from the AEC landing page to partitions, correlation groups and occurrences. In the Partitions UI, you can see all topology partitions in which correlation patterns/groups can be detected. The Correlation Groups UI shows you all patterns that have been detected. The Occurrences UI provides you with the key details on all correlation events that occurred during a certain time period.
  • Pages with detailed information for individual correlation groups, correlation events and root cause details.
  • A page with a topology context to which a certain correlation pattern belongs.

You can promote the desired correlation groups or prohibit the undesired groups in the AEC Explained UI. See Use the AEC Explained UI -> Group Details:

  • If AEC already detected a pattern for one correlation group, you can instruct it to apply this pattern to other correlation groups by promoting this correlation group. The "Promote this correlation group" action ensures that a specific group of events is always correlated. The system will look for this group of events globally, expanding the possibilities of it being found.
  • If you do not want to correlate a group of events, you can instruct the system to prohibit a specific correlation group. The "Prohibit this correlation group" action stops a group of events from being correlated.

Additionally, the AEC Explained UI allows you to select a different root cause thus overwriting the one suggested by AEC. (This way, AEC will use this event as the root cause when the next correlation occurs.)

Figure 1: The AEC Explained UI – Occurrence Details with the Set as root cause option

Another way to influence which events are detected as root cause is by using the new Tuning UI.

The AEC Tuning UI

Our new AEC Tuning UI allows you to adjust different areas of AEC for your specific data. Tuning options allow you, for example, to influence which events will be shown as a possible root cause and to exclude normal events from correlations.

Figure 2: The AEC Tuning UI

If your event titles contain important keywords or phrases that can help identify the root cause, you can increase their Title Keywords scores. If your experience shows a certain CI Type is more likely to be the root cause of an event, set that CI Type to a higher score.

Note: AEC can automatically determine correlation groups by reading the RTSM topology data and checking which CIs are linked with Impact Relationships. However, topology is not mandatory; without it, AEC will create a partition for each Node CI and analyze its events.

If you have certain CIs or nodes for which the events should never be correlated, use the Exclude functionality to remove all events coming from a specific host or other CI from correlations. To explore the full list of tuning options, see Automatic Event Correlation Tuning UI and preview mode.

When you are done with the tuning and exploration of AEC, then you can switch off the preview mode in the Tuning UI and send correlation events to OBM to facilitate event processing for your operators. Correlation events will make it easier for them to identify and work on the root cause. Also, when they close the cause event, all associated events will also be closed, which will save operators time and effort.

Note: These two new features are marked as Technology Preview for now, as we are keen on fine-tuning this UI further according to your needs. Please provide your feedback via the Operations Bridge Idea Exchange.

We encourage you to try out our new features and enhancements! For further information on our offerings, visit the Operations Bridge product page, explore our documentation resources and check out our videos and blogs.

If you have feedback or suggestions, don’t hesitate to comment on this article below.

Explore the full capabilities of Operations Bridge by taking a look at these pages on our Practitioner Portal: Operations Bridge Manager, SiteScope, Operations Agent, Operations Connector (OpsCx), Operations Bridge Analytics, Application Performance Management (APM) and Operations Orchestration (OO).

If you have feedback or suggestions, don’t hesitate to comment on this article below.


Read all our news at the Operations Bridge blog. 

Have technical questions about Operations Bridge? Visit the Operations Bridge User Discussion Forum. 

Keep up with the latest Tips & Information about Operations Bridge.  

Do you have an idea or Product Enhancement Request about Operations Bridge? Submit it in the Operations Bridge Idea Exchange.


Operations Bridge