Having problems with your account or logging in?
A lot of changes are happening in the community right now. Some may affect you. READ MORE HERE

New - Central Thresholding in Operations Bridge

Micro Focus Contributor
Micro Focus Contributor
2 0 398

As already mentioned in the What’s new in the November release blog post, we introduced a new central thresholding service in the 2018.11 Operations Bridge release. A new thresholding service? Isn’t Operations Bridge doing thresholding since years? Why a new service?

Let me explain…

This new service is part of our OpsBridge container deployment and offers thresholding on top of data in our central store. Hence, this new service is part of our overall COSO strategy.  COSO stands for “Collect Once, Store Once” – our strategy to collect all data required for Operations management only once (CO), and to store it in our central store, from which it can be used by different services. Our central store (SO), is also known as “ITOM Intelligent Data Lake” (IIDL). Performance Management is the name of the capability in the Operations Bridge container deployment.

IIDL can receive all sorts of data, from Operations agents, SiteScope, Business Process Monitor and so on, and many of those already provide thresholding. But IIDL can also receive data from external sources via Operations Connector and other interfaces which might not provide thresholding.  And the Streaming Edition of our Operations Agent – a lightweight agent for cloud use cases – also does not provide any thresholding capability. It can only forward metrics to IIDL.

For such use cases a thresholding on top of data in IIDL would be useful. And that’s what the new service provides. This service is part of the Operations Bridge Performance Management capability in the Operations Bridge container deployment.

This new thresholding service monitors the data coming into the store, allows to define thresholds and if there is a threshold violation, it generates an alert. This violation information can be forwarded to OBM via a REST web service.

Here, we will discuss how to define the thresholds – focusing on the Operations Agent Streaming Edition use case. This means, I will show you how to define some threshold for metrics provided by an Operations Agent Streaming Edition.

How Central Thresholding Works?

The thresholding service is designed to scan the data on the input stream of IIDL to detect the threshold violations and report the same. Every record that gets inserted to a designated IIDL table, flows through the stream (except bulk loaded data) and hence can be scanned for the condition that defines a threshold violation. The IIDL table name is same as the stream name (kafka topic name) and the column names of the table are same as the field names of the record in the stream.

The threshold service can be configured with multiple Threshold Configuration definitions each uniquely identified by a “Threshold Configuration Name”. Each threshold configuration defined operates on one data-stream identified by a “Data Stream Identifier” (kafka topic name) defined in the configuration. Every such configuration can then have multiple classification rules or conditions to classify the records into groups and sub-groups. Every grouping level can then have threshold conditions defined for different severities.

The Thresholding service also provides REST API interface to create, read, modify and delete threshold configuration definitions. The API details are described in the documentation here. Though these REST APIs are available, it is recommended you use the config file policies to define the thresholds.

Threshold Configuration for Operations Agent Streaming Edition

If you are forwarding data to our central store using the Operations Agent Streaming Edition and you want to generate an OBM event when a threshold is violated, use the following steps. It is achieved using config file policies and Event from REST Web Services policies respectively.

You will need to perform the following steps:

  1. Deploy Policies to Stream Metrics into IIDL
  2. Create the Threshold Configuration Policy
  3. Create an Event Mapping Policy
  4. Create Operations Agent Streaming Edition Aspect

Deploy Policies to Stream Metrics into IIDL

For the Operations Agent Streaming Edition use case, let us assume that you have deployed the System Metric Streamingaspect of the Infrastructure Management Pack. The Infrastructure Management comes for free with any Operations Bridge Manager.

If unchanged, it streams a set of important GLOBAL metrics, but you can change what metrics should be sent. You can get the list of metrics in GLOBAL class, when you edit the Sys_SystemMetricStreaming policy in OBM (see Figure1).

figure1.jpgFigure1Make sure to select ITOM Intelligent Data Lake as target endpoint, see Figure2

figure2.jpgFigure2

Create the Threshold Configuration Policy

Let’s assume you want to define thresholds for the following metrics:

GBL_CPU_TOTAL_UTIL for CPU Utilization

GBL_MEM_UTIL for Memory Utilization

FS_SPACE_UTIL for Filesystem Utilization

BYNETIF_OUT_BYTE_RATE for Network Utilization

For this, you need to create policies of type Configuration File’ in OBM. Use one of the examples at the end of this blog as template for the policy data. Here, I am considering the CPU example. In this example, I am configuring threshold for GBL_CPU_TOTAL_UTIL metric. A violation of type ‘Major’ is generated when GBL_CPU_TOTAL_UTIL is greater than 95% and ‘Minor’ when it is greater than 90% and ‘Warning when it is greater than 85%.

As a best practice you can use template groups, so you might want to create a group called “Central Thresholding” and create the configuration and event mapping policies in this group.

It is recommended you see the threshold configuration schema before you create your configuration file to understand the semantics. The threshold configuration (starting from ThresholdConfigurationName tag in the below example) is basically a JSON. This JSON conforms to the JSON schema as described in Threshold configuration schema.

The policy data for CPU will look as in Figure3, in OBM. You can get the CPU Threshold configuration and couple of others from the section (Sample Configuration Policy) at the end. 

figure3.jpgFigure3

Let us see the mandatory and optional fields for creating the configuration policy:

  1. ThresholdConfigurationName (Mandatory) - A unique configuration name should be given to the newly defined threshold, for example, streamingCPUThreshold.
  2. DatastreamID (Mandatory) - The kafka topic name for the intended data to be scanned. The Global CPU metrics are available in SCOPE_GLOBAL kafka You can get the list of metrics in GLOBAL class and SCOPE datasource when you edit the Sys_SystemMetricStreaming policy in OBM. Refer to the section above (Deploy Policies to Stream Metrics into IIDL).
  3. ForwardToURL (Mandatory) – You can configure a HTTP(S) receiver to receive the threshold violation data. This is required when you want to get threshold violation events in OBM. You will see later how to create and deploy a corresponding Event from REST Web Service policy to receive threshold violation data and process further to generate OBM events. The source URL of this Event from REST Web Service policy must then be used to set this JSON field.
  4. ForwardAs (Optional) – It is set to “JSON” by default, the HTTP(S) receiver of threshold violations will receive the data in JSON format. As we are configuring Event from REST Web Service in OBM, to receive the threshold violation data, this field must be set to “XML” as the Event via REST policies can only process XML data.
  5. Thresholds (Optional at top level / Mandatory within classifications) - You can set multiple threshold levels as an array. Each threshold consists of a mandatory “Severity” that is reported back in the threshold violation data, and a mandatory JsonLogic based boolean “Condition” that is evaluated to determine the violation.
  6. Classifications (Optional) - You can define rules to classify the records in different groups and sub-groups and apply different threshold conditions for different classifications. For example…
    Every classification consists of
    • A mandatory “ClassificationName” that is reported back in the threshold violation data.
    • A mandatory JsonLogic based boolean “Condition” that is evaluated to determine the membership to this classification
    • A mandatory “Thresholds” (similar to the one explained above) that is applicable to records that gets classified to this classification
    • An optional nested “Classification” that defines the next level of sub-classification.

       There are other fields in the configuration policy (see Figure3), let us see what these values are,

    1. Application: For example, SystemInfrastructure.
    2. Subgroup: For example, StreamingAgent.
    3. Filename: For example, streamingCPUThreshold.cfg. This is the name of the configuration policy. It is with this name the policy is stored on the OBM agent.
    4. Data: It shows common policy instrumentations loadThresholdConfig.pl and removeThresholdConfig.pl. Which are being used to Install (InstallCommand) and Deinstall (DeinstallCommand) the config file policies. This can be reused from OBM MP for VMWare Infrastructure 1.100.

                       loadThresholdConfig.pl takes the following parameters

                          -Application: For example, SystemInfrastructure

                          -Subgroup: For example, StreamingAgent

                          -Filename: For example, streamingCPUThreshold.cfg

                          -ThresholdConfigName: For example, streamingCPUThreshold

                      removeThresholdConfig.pl takes the following parameters

                         -ThresholdConfigName: For example, streamingCPUThreshold

                     You will see what role the above instrumentation plays in the section below (How Does the Aspect work?).

Similar to the example for CPU (streamingCPUThreshold), you can create threshold policies for Memory, Filesystem and Network. You can see the configuration policy data for CPU, Memory, Filesystem and Network from the section (Sample Configuration Policy) at the end. If you want to create different kinds of configuration policies with multiple classifications and groups see here.

You can see in Figure4, the four configuration policies configured in OBM.

figure4.jpgFigure4

Creating an Event Mapping Policy

The next step is to create an ‘Event from Rest Web Service’ Event policy in OBM that receives threshold violation data and generates corresponding events. Follow the steps below to create an Event Mapping Policy.

1. Create a new ‘Event from Rest Web Service’ Policy under Event template in OBM. For example, StreamingAgent_CPUMemUtilThresholdMapping, see Figure5.

figure5.jpgFigure52. In the ‘Source’ tab, enter the name of the rest endpoint. This name has to be the same as the one in the ‘ForwardToURL’ used in the configuration policy. For example, streaming_vm_threshold. Enter ‘root’ for ‘XML Event tag. See Figure6.
figure6.jpgFigure63. In the ‘Defaults’ tab, enter the details as shown in Figure7
figure7.jpgFigure74. In the ‘Rules’ tab, create policy rules. For example, VM_CPU and VM_MEM. You can give the condition for both the policies as mentioned in Figure8.

figure8.jpgFigure8You can set the Event Attributes as required, see Figure9. The value for the attributes has to be determined by looking at the kafka topic or the corresponding table where the data is stored in the store. Alternatively, you can get the list of metrics in GLOBAL class, when you edit the Sys_SystemMetricStreaming policy in OBM (see Figure1).

figure9.jpgFigure9

Create Operations Agent Streaming Edition Aspect

You can create an Aspect for Operations Agent Streaming Edition which will utilize all the configuration and event policies. For example, streamingThresholds, which consists of the following files:

Instrumentation

      loadThresholdConfig

      removeThresholdConfig

 Config file policies

       StreamingAgent_CPUUtilThreshold (CPU utilization)

       StreamingAgent_MemUtilThreshold (Memory utilization)

       StreamingAgent_FileSystemUtilThreshold (FIlesystem utilization)

       StreamingAgent_NetworkUtilThreshold (Network utilization)

Event via REST policies

      StreamingAgent_CPUMemUtilThresholdMapping (Rules for CPU and Mem alerts)

      StreamingAgent_FileSystemUtilThresholdMapping (Rules for FileSystem alerts for VMs)

      StreamingAgent_NetworkUtilThresholdMapping (Rules for Network Utilization alerts for VMs)

How Does the Aspect Work?

The newly created Aspect “streamingThresholds” needs to be deployed to the Collect Once Data Broker Operations Agent to start detecting the threshold violations. The PERL script loadThresholdConfig.pl reads the configuration file and creates the POST request.

Un-deploying these config file policies result in invocation of the DeinstallCommand. This script then invokes the REST API call to remove the corresponding threshold config definition.

Every threshold violation detected from a particular threshold config is forwarded to the associated REST endpoint defined by the field <ForwardToURL>. This field is set to the URL of an Event via REST policy endpoint defined in the event mapping policy. The threshold violation data can be processed further to generate the OBM events. Details like severity, the group membership, the threshold configuration, and the original record are available via the REST web service and can be used to set OBM event attributes.

If you want to change the threshold values for any of the configurations, you can update the same in the configuration file policy and redeploy the aspect with the updated version of the policy.

Sample Configuration Policy

You can see the configuration policy for CPU, Memory, Filesystem and Network below.

Configuration Policy for CPU Utilization Thresholds

In this example, threshold violation of ‘Major’ will be generated when CPU Utilization is greater than 95%, ‘Minor’ when greater than 90% and ‘Warning’ when greater than 85%.

Application=SystemInfrastructure
SubGroup=StreamingAgent
Filename=streamingCPUThreshold.cfg
Data:
#$Installcommand="/var/opt/OV/bin/instrumentation/loadThresholdConfig.pl SystemInfrastructure StreamingAgent streamingCPUThreshold.cfg streamingCPUThreshold"
#$Deinstallcommand="/var/opt/OV/bin/instrumentation/removeThresholdConfig.pl streamingCPUThreshold"
{
    "ThresholdConfigurationName" : "streamingCPUThreshold",
    "DataStreamID" : "SCOPE_GLOBAL",
    "ForwardToURL" : "http://itom-collect-once-data-broker-svc:30005/bsmc/rest/events/streaming_vm_threshold",
    "ForwardAs" : "XML",
    "Thresholds" : [
       {
          "Severity" : "Major",
          "Condition" : {">" : [{"var" : "GBL_CPU_TOTAL_UTIL"}, 95 ]}
       },
       {
          "Severity" : "Minor",
          "Condition" : {">" : [{"var" : "GBL_CPU_TOTAL_UTIL"}, 90 ]}
       },
       {
          "Severity" : "Warning",
          "Condition" : {">" : [{"var" : "GBL_CPU_TOTAL_UTIL"}, 85 ]}
       }
    ]
 }

Configuration Policy for Memory Utilization Thresholds

In this example, threshold violation for Memory will be generated only for the VM, ‘criticalvm.test.com’. ‘Major’ when Memory Utilization is greater than 95%, ‘Minor’ when greater than 96% and ‘Warning’ when greater than 90%.

Application=SystemInfrastructure
SubGroup=StreamingAgent
Filename=streamingMEMThreshold.cfg
Data:
#$Installcommand="/var/opt/OV/bin/instrumentation/loadThresholdConfig.pl SystemInfrastructure StreamingAgent streamingMEMThreshold.cfg streamingMEMThreshold"
#$Deinstallcommand="/var/opt/OV/bin/instrumentation/removeThresholdConfig.pl streamingMEMThreshold"
{
    "ThresholdConfigurationName" : "streamingMEMThreshold",
    "DataStreamID" : "SCOPE_GLOBAL",
    "ForwardToURL" : "http://itom-collect-once-data-broker-svc:30005/bsmc/rest/events/streaming_vm_threshold",
    "ForwardAs" : "XML",
    "Classifications" : [
    {
    "ClassificationName" : "criticalVMs",
    "Condition" : {"==" : [{"var" : "host_name"}, "criticalvm.test.com"]},
    "Thresholds" : [
      {
          "Severity" : "Major",
          "Condition" : {">" : [{"var" : "GBL_MEM_UTIL"}, 98 ]}
       },
       {
          "Severity" : "Minor",
          "Condition" : {">" : [{"var" : "GBL_MEM_UTIL"}, 96 ]}
       },
       {
          "Severity" : "Warning",
          "Condition" : {">" : [{"var" : "GBL_MEM_UTIL"}, 90 ]}
       }
    ]
    }
    ]
 }

Configuration Policy for FileSystem Utilization Thresholds

In this example, threshold violation for Filesystem will be generated only for the ‘/boot’ directory. ‘Major’ when Filesystem Utilization is greater than 95%, ‘Minor’ when greater than 90% and ‘Warning’ when greater than 85%.

Application=SystemInfrastructure
SubGroup=StreamingAgent
Filename=streamingFileSystemThreshold.cfg
Data:
#$Installcommand="/var/opt/OV/bin/instrumentation/loadThresholdConfig.pl SystemInfrastructure StreamingAgent streamingFileSystemThreshold.cfg streamingFileSystemThreshold"
#$Deinstallcommand="/var/opt/OV/bin/instrumentation/removeThresholdConfig.pl streamingFileSystemThreshold"
{
    "ThresholdConfigurationName" : "streamingFileSystemThreshold",
    "DataStreamID" : "SCOPE_FILESYSTEM",
    "ForwardToURL" : "http://itom-collect-once-data-broker-svc:30005/bsmc/rest/events/streaming_vm_threshold",
    "ForwardAs" : "XML",
     "Classifications" : [
    {
    "ClassificationName" : "bootdir",
    "Condition" : {"==" : [{"var" : "FS_DIRNAME"}, "/root"]},
    "Thresholds" : [
       {
          "Severity" : "Major",
          "Condition" : {">" : [{"var" : "FS_SPACE_UTIL"}, 95 ]}
       },
       {
          "Severity" : "Minor",
          "Condition" : {">" : [{"var" : "FS_SPACE_UTIL"}, 90 ]}
       },
       {
          "Severity" : "Warning",
          "Condition" : {">" : [{"var" : "FS_SPACE_UTIL"}, 85 ]}
       }
    ]
    }
    ]
}

Configuration Policy for Network Utilization Thresholds

In this example, threshold violation of ‘Major’ will be generated when Network Utilization is greater than 5000, ‘Minor’ when greater than 4500 and ‘Warning’ when greater than 4000.

Application=SystemInfrastructure
SubGroup=StreamingAgent
Filename=streamingNetworkThreshold.cfg
Data:
#$Installcommand="/var/opt/OV/bin/instrumentation/loadThresholdConfig.pl SystemInfrastructure StreamingAgent streamingNetworkThreshold.cfg streamingNetworkThreshold"
#$Deinstallcommand="/var/opt/OV/bin/instrumentation/removeThresholdConfig.pl streamingNetworkThreshold"
{
    "ThresholdConfigurationName" : "streamingNetworkThreshold",
    "DataStreamID" : "SCOPE_NETIF",
    "ForwardToURL" : "http://itom-collect-once-data-broker-svc:30005/bsmc/rest/events/streaming_vm_threshold",
    "ForwardAs" : "XML",
    "Thresholds" : [
       {
          "Severity" : "Major",
          "Condition" : {">" : [{"var" : "BYNETIF_OUT_BYTE_RATE"}, 5000 }
       },
       {
          "Severity" : "Minor",
          "Condition" : {">" : [{"var" : "BYNETIF_OUT_BYTE_RATE"}, 4500 }
       },
       {
          "Severity" : "Warning",
          "Condition" : {">" : [{"var" : "BYNETIF_OUT_BYTE_RATE"}, 4000 }
       }
    ]
}

 

As you can see, Central Thresholding is a valuable and flexible capability that allows you to send alerts on all types of data arriving at the Store (IIDL), not just data from source that generate their own alerts, but also from sources that do not currently have the thresholding capability. This feature is especially useful, when thresholding is not available at the data source or not under control of central monitoring team, for example, Virtualization collector and Streaming agent.

We encourage you to try out our new features and enhancements! For further information on our offerings, visit the Operations Bridge product page, explore our documentation resources and check out our videos and blogs.

If you have feedback or suggestions, don’t hesitate to comment on this article.

Explore full capabilities of Operations Bridge by taking a look at our Operations Bridge Manager, Operations Bridge Analytics, Operations Bridge Reporter, Operations Connector (OpsCx), Business Value Dashboard (BVD) and Operations Orchestration (OO) documentation!

Events

To get more information on this release and how customers are using Operations Bridge, we are happy to announce the following events:

Read all our news at the Operations Bridge blog.

References

Explore all the capabilities of the Operations Bridge and technology integrations by visiting these sites:

The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.