In this blog, we would like to discuss the concepts and challenges of the cluster-aware monitoring and take a look at how it is implemented in Operations Bridge. This post assumes that you are familiar with the general principles of high availability and cluster setups; nevertheless, we will briefly explain a few conceptual basics relevant to this topic.
Note: This blog focuses on the MS SQL Server cluster. Note that other types of cluster software are supported as well. For details on which cluster software and which versions are supported, see the Operations Bridge latest support matrices.
A highly available cluster is a group of independent nodes (physical or virtual machines) that work together to increase the availability of the applications running in a cluster. If one or more cluster nodes fail, other nodes take over the workload of the node that is no longer available (this is known as a failover). Generally, the more redundant resources you have in your cluster, the higher failure tolerance and better scalability will be achieved.
Applications running in cluster environments are configured as high-availability resource groups. A high-availability resource group (HARG) is a generic term for cluster objects representing highly available applications running on one of the physical nodes of the cluster. A HARG is set up differently depending on the cluster software you are using; in case of Microsoft SQL Server running inside a Windows Cluster, a typical cluster consists of two physical nodes and a HARG that runs a single Microsoft SQL instance hosting a single application (database). A virtual node is linked to a HARG and used to communicate with that HARG.
Note: Other Microsoft SQL Server cluster configurations are possible as well; for more information, please refer to the Microsoft SQL Server documentation.
Challenges for monitoring applications
If the physical node that currently hosts the HARG goes down, the HARG is switched over to another physical node by the cluster software.
From a monitoring perspective, this creates a problem, as the monitoring needs to be switched immediately as well – to the node now hosting the application – in order to ensure that there is no disruption of the application monitoring process. This cannot be handled manually as failovers may occur at any time; instead, this task must be automated. To achieve this, the monitoring software needs to be aware of a cluster. In addition, it must know that the node to which it had deployed monitoring was actually a virtual node.
Solution 1: Custer-aware monitoring in Operations Manager using virtual nodes
To address the challenges mentioned above, Operations Manager for Linux and Windows had introduced the concept of virtual nodes on the management server and allowed deployment of policies to a virtual node representing a HARG (with the effect that policies were then automatically deployed to all physical nodes). Additionally, the Operations Agent was made cluster-aware so that it was able to detect which HARG is running on a physical cluster node and enable policies accordingly. It was also enhanced to detect failovers.
Solution 2: Cluster-aware monitoring in Operations Bridge Manager (OBM)
OBM offers a similar feature, but it goes one step further, since monitoring in OBM is typically assigned to application CIs, and not to (physical or virtual) nodes.
So how does it work?
OBM recognizes clustered applications when it detects certain relationships between application CIs and virtual node/HARG CIs in the RTSM. Next (as it is now cluster-aware), it deploys monitoring automatically to all corresponding physical nodes. As with OM, the Operations Agents then detect where the HARG is currently hosted and enable monitoring on that node only.
This is a generic concept that can be applied to different cluster environments and clustered applications. For more information on OBM cluster awareness, see the topic Virtual Nodes on the Micro Focus Documentation Portal.
Now let us take a closer look at cluster-aware monitoring by reviewing the following example – monitoring of a clustered Microsoft SQL Server.
Microsoft SQL Server is proactively monitored by the OBM Management Pack for MS SQL Server. It discovers the clustered Microsoft SQL Server and creates a corresponding cluster model in the RTSM, thus making OBM aware of the cluster environment. In OBM, monitoring aspects are then assigned to the Microsoft SQL Server CI directly. The person assigning an aspect does not even have to know if the SQL Server is clustered or not. OBM then deploys monitoring to the corresponding physical nodes. See the OBM Management Pack for Microsoft SQL Server documentation on the Micro Focus documentation portal for detailed information on how to monitor Microsoft SQL Server instances with the Management Pack.
The policies are enabled on the active node (the node hosting the HARG) and are disabled on passive nodes as shown in the following screenshots:
Policy Status on the Active Node
Policy Status on the Passive Node
If a failure in the cluster environment occurs (the active node that hosts the monitored application crashes), the application is moved over to the other node in the cluster. It is important that the monitoring switches as well, to keep monitoring the application and thus avoid the monitoring downtime. In such a case, the Operations Agent automatically determines which node is now hosting the application/HARG and enables monitoring on it, which results in no monitoring downtime. Moreover, as this functionality is now available out of the box, it takes only little effort to configure and use it! In fact, all you need to do is to assign monitoring to the Microsoft SQL Server CI you want to monitor; the rest is taken care of by Operations Bridge.
Note: Another option is to assign monitoring aspects to a virtual node (similar to the functionality available with Operations Manager, as already described in this post). This is a possible but not recommended scenario, as we encourage our customers to take advantage of the CI-based (topology-centric) approach as opposed to the node-based approach practiced with OM. To learn about the benefits of the CI-centric approach and get information on how to establish the RTSM topology, please see this topic in the Operations Bridge Help Center.
Below you see the RTSM model discovered out of the box by the Management Pack for Microsoft SQL Server 2018.11.
The diagram shows the model that has to be in place to reflect a HARG running inside a cluster.
The relation of the Microsoft SQL Server to the physical node no longer exists; instead, you see the relation to the virtual node (or the HARG) resulting in discovery of the HARG.
Note: Virtual nodes are presented in the RTSM as CIs of the cluster_resource_group type. A virtual node CI is assigned the host name and the IP address belonging to its HARG. The central attribute of a virtual node CI is its HARG name.
Keep in mind that you can apply this concept to other applications where the RTSM model is not provided out of the box. In this case, you would need to create the model similar to the one shown in the above diagram manually or discover it.
Additional Benefits and Considerations
With the support of cluster awareness, virtual nodes can be treated in the same way as the regular managed nodes. This provides the following additional advantages:
- The events created in the scope of an HARG can be associated to its virtual node instead of the physical node
- The correct filtering and highlighting in event browser and service navigation
- The ability to run tools on virtual nodes (where the HARG is active)
Note: Cluster awareness-enabled policy state cannot be tuned from OBM as the Operations Agent performs policy enable/disable based on the CRG state on the physical node.
In addition, consider the following:
- The Operations Agent must be installed on every node in the HA cluster
- A virtual node can be associated with only one HA resource group name
- An HARG name can be assigned to more than one virtual nodes, but these virtual nodes should not share any common physical nodes. This is due to that fact that any policy assigned to both virtual nodes would receive the same HARG the second time and the cluster awareness feature would not be able to distinguish the virtual nodes.
- We do not yet provide a command-line utility to add or modify virtual node CIs (a virtual node CI can be added in the RTSM). Also, the Monitoring Automation UI does not yet offer the ability to add or modify virtual nodes
We encourage you to try out our new features and enhancements! For further information on our offerings, visit the Operations Bridge product page, explore our documentation resources and check out our videos and blogs.
If you have feedback or suggestions, don’t hesitate to comment on this article.
Explore full capabilities of Operations Bridge by taking a look at our Operations Bridge Manager, Operations Bridge Analytics, Operations Bridge Reporter, Operations Connector (OpsCx), Business Value Dashboard (BVD) and Operations Orchestration (OO) documentation!
To get more information on this release and how customers are using Operations Bridge, we are happy to announce the following events:
- Webinar replay – How Garanti Bank Delivers an Excellent End User Experience with Operations Bridge
- Webinar replay – Hybrid IT is the new normal
- Webinar replay – Solving the complexity of Hybrid Cloud monitoring
- Webinar replay – 6 Ways to Simplify the Management of Your Multi-Cloud Environment
- See all the Micro Focus events Worldwide
Read all our news at the Operations Bridge blog.
Explore all the capabilities of the Operations Bridge and technology integrations by visiting these sites: