CSA Telemetry: How to increase uptime and availability for your CSA cluster?

Jaimini Frequent Contributor.
Frequent Contributor.
1 0 506

Guest post by Seetharam R, Prasanna Kumari M S, and Ajith Kumar S

As an R&D engineer, I recently had the opportunity to support and fix a situation where a large bank was facing issues with Cloud Service Automation stability in cluster environment. The load balancing technique of round robin used in the cluster environment, sometimes fails to handle large amount of data. This methodology often results in issues such as: only one of the nodes accepting the requests or the nodes receiving too many http calls. Also the servers are susceptible to become unstable or unresponsive due to issues such as:

  1. High incoming flow of HTTP connections to the load balancer
  2. Incorrect handling of HTTP connections when routed through a load balancer
  3. HTTP traffic exceeding the configured limit in JBoss AS

This behaviour results in the server responding back with either 503 or 513 errors. It leads to delay in provisioning in a cluster setup and also production down scenario.

There was a need for a mechanism to detect whether the nodes are reaching their capacity. Although there are numerous node and network monitoring tools such as Java Melody and Mosquito available in the market, these tools are generic and usually monitor resources such as CPU usage, thread count etc. They are limited in their scope and do not allow us to monitor information which is specific to CSA.

To solve this problem you need a tool that can collect information about parameters such as HTTP calls and IP connections that are specific to CSA. Also this tool should help monitor the parameters and detect whether the load is not being shared equally or if the nodes are reaching their capacity. This is why an automated process is required which collects information regarding the application specific load on the server running in each CSA node of a cluster environment.

The solution is a new tool “CSA Telemetry”  available now.

Built on open-source module Byteman which makes it easy to trace and monitor the behaviour of java applications and JDK runtime code, CSA Telemetry allows for the injection of Java code inside the application methods—even when the application is running. With Byteman, JBoss undertow code is instrumented to gather the required application specific load information.

A background thread is implemented in CSA which runs periodically to collect the latest data from the application specific information and update it to the database accordingly.

REST API’s are the end points which allow the user to view the load specific data of the cluster environment. The user can provide the time interval in which they want to view the information. Accordingly, the REST API will provide the load specific data for each node in a cluster environment for the specified time interval. This information allows the user to analyse the load on each node and scale up/down the node count to optimize their environment accordingly.

The information collected from the background thread for each node in a cluster environment includes:

  1. The incoming IP connections made to CSA
  2. The closed IP connections
  3. The incoming CSA and IDM HTTP calls made to JBoss server
  4. The total number of IP and HTTP connections made to CSA
  5. The number of MPP request calls processed by CSA. (The request calls include the number of subscriptions ordered, cancelled, modified and also the public actions carried out on subscriptions).

 Telemetry_architecture_diagram.PNG

Figure 1 : Architecture Diagram of CSA Telemetry

Ultimately, the user is given flexibility to choose how they can utilize the telemetry data to monitor their environment.

A sample response of the CSA Telemetry API can be found here:

Telemetry_sample_response.PNG

Figure 2: Sample Response of the Telemetry API

-OR-

Telemetry_sample_request.PNG

Figure 3: Sample Request to the CSA Telemetry API

Telemetry_sample_response_API.png

Figure 4: Sample Response from the CSA Telemetry API

The telemetry project can also be coupled with open source network monitoring tools such as Nagios or Micro Focus Sitescope for agent or agentless monitoring. It allows the customer to monitor additional parameters such as the performance of servers, application components, and operating systems.

The scope can further be enhanced by using the telemetry data along with tools such as Micro Focus Business Value Dashboard (BVD) to create customized dashboards. It provides the customer with the flexibility to visualize their information.

I hope that this peek into how to increase uptime and availability for your Cloud Service Automation cluster. You can experience these capabilities for yourself with a free trial of CSA here.

The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.