7 minute read time

OpenText OpScope – How to quickly find the root cause of a problem

by   in IT Operations Cloud

Introduction

In the article Technical Overview of Open Text OpScope you learned about OpenTextTM OpScope capabilities and the benefits it brings. We also provided a few technical details on how to get up to speed with OpScope. This article describes how to navigate OpScope UIs to isolate your problem and find the root cause quickly.

Note that the capabilities delivered in OpScope can also be utilized in Operations Bridge and there are shared components between the two products which may be labeled as “Operations Bridge.”

How OpScope helps you find a root cause of your problem

At the beginning of your OpScope journey, you see the Applications and Service Status page that displays your monitored applications and the topology. You can view both Cloud instances monitored by cloud-native collectors and application monitors.

Figure 1. OpScope: Application and Service Status page

This page shows where you may have problems (Watch List), the underlying topology of the selected application (Health Top View) and the related events (Events).

Here are more details about these UI components:

  • Watch List (upper left) shows a list of your monitored applications. The application status is determined by the severity of the events generated for the underlying resources. You can immediately see the most impacted elements of your monitored environment.
    Selecting an application filters the Health Top View and Events widgets.
  • Health Top View (lower left) displays the resources related to the application selected in the Watch List. Use the Health Top View to identify problematic resources within the impacted application. Selecting a Configuration Item (CI) within that topology filters the Events so that only the events of that CI are displayed.
  • Events (right) displays a list of all events filtered by the selected view, application or configuration item (CI).

A typical workflow of an OpScope SRE user

When you use OpScope for your Site Reliability Engineering practices, a typical workflow would look like this – we call this the guided workflow: 

Figure 2. A typical OpScope SRE workflow

Use the Application and Service Status page to quickly identify problem areas. Click an application to see which part of the application has an issue and view the related events.

Then use the information provided in the Event Summary page of each event to determine possible root causes. Event Summary pages display information about the related CI, with some key related metrics and their values over time. This is often useful to determine where a problem has started. It can also help you understand if other areas are impacted as well or if only a single metric threshold was violated.

From the Event Summary page, you can drill down into the following pages that provide you with more information (available drill-down links depend on the event type and OpScope add-ons you are using):

  • View Event Details showing the details of the related event.
  • Quick Report where you can view related metrics.
  • View Dashboard (Cloud events) opens a cloud monitoring dashboard.
  • Multivariate Anomaly Details (MVAD events) helps you isolate metrics that contributed most to the anomaly.
  • Application Observability (cloud-native applications) allows you to access the application telemetry data from OpenTelemetry (OTel) instrumented applications.

All Event Summary pages have the following widgets:

  • Event Details. Contains the information about event severity and the time the event was generated. The View event details option displays some details, such as the event description, priority, instructions, etc.
    Quick Reports option allows you to see more metrics associated with the service.
    The Application Observability option allows you to access critical application telemetry data, such as metrics, logs and traces. (Application observability uses OTel instrumentation and collection to trace your cloud native applications and provides guided workflows to troubleshoot the root cause of an application problem.)

Note that application observability is currently available as a Technology Preview. Contact your OpenText sales representative if you’d like to participate in the Technology Preview.

View Dashboard opens a cloud monitoring dashboard that is available for cloud events.

The Remediate Event button allows you to take remedial actions on the events. This button is available for all events. It allows you to execute Operations Orchestration runbooks (Operations Orchestration SaaS license required).

  • Event Metrics. Offers a metric selection table with the metric that triggered the pre-selected event. It also displays the event on the chart. You can select other metrics associated with the impacted instance to compare them with the metric that triggered the event.
    Use the Event Metrics widget to plot associated metrics on the same graph to identify correlations between them around the time the event has occurred.
  • Instance Details. Displays the details, such as tags, status, up time, CPU, memory and the operating system.

In the screenshots below, you can see sample Event Summary pages that you can access from the Application and Service Status page.

Figure 3. Event Summary (agentless monitoring), including a link to Application Observability

Figure 4. Event Summary for a Cloud event, including display of an MVAD Anomaly event occurrence

Figure 5. Event Summary for an MVAD anomaly event, with a link to view anomaly details

The subsequent UI pages offer you drilldowns into more specific OpScope UIs.

The Multivariate Anomaly Details page (Figure 6) provides you with various detailed information in the form of the following widgets:

  • Metrics and Anomalies. Displays a line graph of the time-series data, with all metrics labeled and associated with their CIs. The metrics are listed below the graph and the anomalies are placed above the graph and are marked by an anomaly icon. Individual metrics can be highlighted by selecting them from the bottom of the pane. You can also drill down into Metric Details for each metric.
  • Contribution Score. Shows you the graph with all metrics that contributed to the anomaly. It also allows you to see which metric contributed the most (in percentage).
  • CI Information. Lists the basic information about related CI, such as its ID, type, public address, etc.
  • Health Top View. The CI icons in Health Top View provide a visual indication of the health of the related CI, based on the hierarchy tree structure defined for each view.
  • Events. Displays a list of related MVAD anomaly events with their title, severity, time when they were received and the related CI information. You can switch between different events to view the corresponding information shown in the widgets.
  • Time Range Selector. Allows you to select a time range that you want to apply to all the widgets available on a page. You can choose either an absolute or a relative time range for a time zone.

Figure 6. Multivariate Anomaly Details page

On a Quick Report page, you can see more metrics associated with the service or monitor, as shown in Figure 7.

Figure 7. A quick report

When drilling down into Application Observability, you can see the service map of an application, where services with errors are shown in red. You can take a look at the service list with errors, latency and requests per seconds for a service or a transaction. You can also drill down into the traces of a transaction that shows errors by clicking the Analyze Traces button.

Figure 8. Application Observability including the link to analyze traces

The Analyze Traces page (Figure 9) shows you individual traces and their duration. You can sort the traces using the duration and then analyze the ones that took the longest time. The details show which span took the most time. If needed, you can also review the corresponding logs that were created for that specific transaction (Figure 10).

Figure 9. Analyze traces

Figure 10. View log details of the transaction

You can pass on this detailed information to the developers to let them know where exactly their code caused a delay or a performance problem.

Note: You can also create a Service Management Automation X (SMAX) ticket on a selected event by right-clicking it and selecting Transfer control to Trouble Ticket.

We have also covered OpScope in our Operations Bridge Release Readiness Webinar. Have a look at our recording to see a short demo on how to use OpScope. The slides and the recording are available on our Community page here.

We encourage you to try out our new features and enhancements! For further information on our offerings, visit the OpenText OpScope product page, explore our documentation resources and check out our blogs.

If you have feedback or suggestions, don’t hesitate to comment on this article below.


Events


Read all our news at the Operations Bridge blog. 

Have technical questions about OpScope or Operations Bridge? Visit the Operations Bridge User Discussion Forum.  

Keep up with the latest Tips & Information about OpScope or Operations Bridge.  

Do you have an idea, a product enhancement request or a technical question? Visit our Community website with the links to Discussions, Idea Exchange and Tips, News and Events. 

Labels:

Operations Bridge