Server Hung State Monitor in SiteScope/OBM

Hi Experts,

Can we monitor the Server Hung state in Microfocus SiteScope or in OBM. If yes please share some steps or suggestions on to this.

This is for Both Windows and Linux.

Regards,

Pranav R N

  • Suggested Answer

    0  

    Hello,

    OBM:  I'm glad you asked me that question :-)  Yes you can - please see https://portal.microfocus.com/s/article/KM000021184?language=en_US.  This document describes how you can OBM monitor it's own event processing and how you can policies it to send notifications either via the notification interface or via some other method, such as email.  It's very simple to extend.  This is for both Windows and Linux.

    SiS: There is some self monitoring, but it's not something I know much about.

    Thanks.




    --
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button

  • 0 in reply to   

    Hi Duncan,

    Thanks a lot for your reply and the doc link.

    As per the doc link, I think we can monitor only our OBM process and services. Please Correct my understanding is correct or not.

    Our Requirement Is to get an alert when the Monitored node is on Hung State. Is it possible to get that one from OBM side or from the SIS side.

    Regards,

    Pranav R N

  • Suggested Answer

    0   in reply to 

    Hello,

    Please check the template groups "OMi Server Self-Monitoring" and "Advanved: OBM Server Monitoring".

    Service/process montoring is part of the OBM Management Packs.  But just because a process is running, doesn't mean it dong much, that's why OMi_BusMonitor is so useful.  The policy runs a perl script called /var/opt/OV/bin/instrumentation/OMiSmBusMonitoring.pl

    This script does in perl something like this:

    # var=$(date -d @$(/opt/HP/BSM/opr/support/opr-jmxClient.sh -s localhost:29622 -b opr.store:name=PipelineStatsMBean -m getTimeOfLastRunInSeconds -a PipelineEntry))

    RMI URL = 'service:jmx:rmi://localhost:29622/jndi/rmi://localhost:29622/jmxrmi'

    # echo $var
    Tue Jul 23 18:39:36 CEST 2024

    This means the JMS bus last ran "Tue Jul 23 18:39:36 CEST 2024" which is about now.  If the bus doesn't run, events are not being processed.  The OMi_BusMonitor automatic action then sends an event to OBMs notification server OR you can send an email using mailx(1) or change the script to do whatever else you feel is appropriate.  I think this is the best method.

    Another option would be to enable event logging mode in the infrastructure settings and then use a script to see if /opt/HP/BSM/log/opr-backend/opr-flowtrace-backend.log updated.  This would be a simple unsupported example which you might run as a scheduled action or cron and assumes an event is processed every 60 seconds and alerts using mailx/opcmsg:

    #!/bin/bash
    LOGFILE="/opt/HP/BSM/log/opr-backend/opr-flowtrace-backend.log"

    TO_EMAIL="somemail@someco.co"
    CHECK_INTERVAL=60

    send_warning() {
      echo ""The log file $LOGFILE has not been updated in the last 5 minutes." | mail -s "opr-flowtrace-backend not updated in 60 seconds" "$TO_EMAIL"
    }

    if [ ! -f "$LOGFILE" ]; then
      /opt/OV/bin/opcmsg o=OBM a=events msg_text= "Log file not found: $LOGFILE" sev=critical
      echo "The log file $LOGFILE not found" | mail -s "log file not found: $LOGFILE" $TO_EMAIL
      exit 1
    fi

    LAST_MOD_TIME=$(stat -c %Y "$LOGFILE")

    tail -f "$LOGFILE" | grep "EventReceiver" &

    TAIL_PID=$!

    while true; do
    sleep $CHECK_INTERVAL

    CURRENT_MOD_TIME=$(stat -c %Y "$LOGFILE")

    if [ "$CURRENT_MOD_TIME" -eq "$LAST_MOD_TIME" ]; then
    send_warning
     else
       LAST_MOD_TIME=$CURRENT_MOD_TIME
     fi
    done

    There are many of other options...  

    I hope you get a better answer.

    --
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button

  • 0   in reply to   

    Hello,

    I've made a massive assumption that it's the OBM server you want to monitor and that might not be the case.

    Do you mean monitoring if the system is impacted using our software?  For example the CrowdStrike issue?  If a system bluescreens (BSOD - Blue Screen of Death) due to CrowdStrike (or anything else) the system will halt.  BSOD means the OS has encountered a critical error from which it cannot recover and crashes. 

    The OS will have stopped and need to go through a recovery process.  The Windows OS itself is dealing with network stack operations, including responding to ping requests so when in BSOD state the system will appear to be offline.  

    In OpsB terms, OA12 will not be running.  Heartbeat polling will fail and the OBM server (assuming it’s not running Windows and also affected) with identify the Windows system has stopped.  SiS will do the same thing.  If our Customer has NNMi, the system will eventually be marked as down as this will fail polling. 

    You could check to see if all the Windows server are running using OBM. 

    Please find below some sample unsupported scripts which might help or wait until HBP or NNMi complains.  However, you’ll have to get the list of nodes from opr-node or from a view:

     
    
    # Powershell
    
    $nodes = @("node1", "node2", "node3")
    
     
    
    foreach ($node in $nodes) {
    
        try {
    
            $output = & "ovdeploy" -node $node -cmd "dir" 2>&1
    
            $timestamp = Get-Date -Format "dd-MM-yyyy HH:mm:ss" 
    
     
    
            if ($output) {
    
                Write-Output "$timestamp - Node $node is working and ovdeploy is ok. Output:"
    
                Write-Output $output
    
            } else {
    
                Write-Output "$timestamp - Node $node did not return any output."
    
            }
    
        } catch {
    
            Write-Output "$timestamp - Error occurred while running command on node $node: $_"
    
        }
    
    }
     

    Or:

    #!/bin/bash
    
     
    
    nodes=("node1" "node2" "node3")
    
     
    
    for node in "${nodes[@]}"; do
    
        output=$(ovdeploy -node $node -cmd "dir" 2>&1)
    
     
    
        if [ -n "$output" ]; then
    
            echo "$(date) Node $node is working and ovdeploy is ok. Output:"
    
            echo "$output"
    
        else
    
            echo "$(date) Node $node did not return any output."
    
        fi
    
    done
    
     

    I hope you get a better answer,

    --
    If you found this post useful, give it a “Like” or click on "Verify Answer" under the "More" button

  • 0

    IF the servers are monitored in OBM you can enable the health check so that if the agent does not respond OBM will raise and event that the agent did not respond within the timeframe configured