Agent Health Problem. The agent did not send events during the last 35 minutes. No additional information is available.

Hi,

we have from time to time the following message in our OBM (2023.05 LNX):

Agent Health Problem. The agent did not send events during the last 35 minutes. No additional information is available. 

Each time for a different agent, for the moment I don't see a pattern.

When we check the agents that seem not to send a message, all is ok. 

2024-09-26 05:48:31,569 [Thread-58] INFO AgentHeartbeatImpl.submitEvent(83) - sent event: db2f1e15-a2fa-445b-aa9b-50f618b525f3 for xxx|yyy|e30ee484-acf4-75e3-04d7-b7f0331e3b0d; severity CRITICAL; Agent Health Problem. The agent did not send events during the last 35 minutes. No additional information is available.
2024-09-26 05:49:31,218 [Thread-58] INFO AgentHeartbeatImpl.submitEvent(83) - sent event: c6ff3453-a22a-443b-8bed-f1053778eb26 for xxx|yyy|e30ee484-acf4-75e3-04d7-b7f0331e3b0d; severity NORMAL; Agent Health Ok.

I was wondering if it is possible to generate more logging/debug, would it be possible to have the hearbeat mechanism be more verbose? e.g. write in the log the first occurrence when a heartbeat from the agent was missed? 

I can use this info to go our network team and start some trace to what is actually send over the lan. Just already to be able to pinpoint if the issue is in OBM or on our LAN

Any help much  appreciated

  • 0  

    please check in Monitored Nodes what type of Agent Health is configured to that agent. it can be "agent", "agent + server" or "none".
    based on your timings I assume it is "agent + server" which means if Agent didnt send event for 30 mins, then server is trying to reach to the agent. if after 5 minutes server couldnt get response from agent, then you will get that Agent Health event.

    so based on your timings, I would suggest to check why server couldnt reach agent...

  • 0 in reply to   

    my plan is to add some automatioc action or groovy that triggers when the message appears but then it can already be too late. So if there is a way to get a message earlier that would be great.

  • 0 in reply to 

    Is also notice that on our agents, the following value is set: OPC_HB_MSG_INTERVAL=1800

    I wonder if this value should not be decreased to 300.  I assume that each time the server gets an agent HB, the 30m interval is reset