Commodore Commodore
Commodore
561 views

How can I troubleshoot a "stale collections" minor alarm on NNMi?

Hello,
How can I troubleshoot a "stale collections" minor alarm on NNMi?


I'm running HP NNMi 9.11.004 on RH Linux 5.8, and I have a fairly persistent Health alarm that comes in about the same time every day, and then clears five minutes later.


The text of the alarm is:
"Stale collections (2) has status Minor because there were between 0 and 5 stale collections in the last 5 minutes."

 

I've checked the statepoller log, but no entry there lines up with the timestamp of the health alarm.


Any help is appreciated.

Thanks

0 Likes
3 Replies
Micro Focus Expert
Micro Focus Expert

Hello,

 

A stale collection is one that has been sent to the polling engine,  but has had no response in 10 minutes.

Generally,  even a slow or unresponsive SNMP agent should end up with the polling engine returning that fact to statepoller so that it can process it and move on.

 

If such a response is not received from the polling engine then it suggests that it,  or statepoller have become overwhelmed and are not processing either end of their dependent queues.   This could potentially happen due to a temporary load on the system as a whole (thus starving resources to individual components such as statepoller and polling),  or due to an increase in polling activity - maybe due to a serious outage that leads APA/Causal to start sending many named polls to neighbour devices to determine a root cause.   Or maybe a bunch of nodes is managed/unamanged, or seeded into discovery,  or something else.    It can be quite difficult to find that triggerpoint.

 

You could examine the state poller and SNMP metrics under the health mechanism and look for increased acitivity as compared to "normal" times.   You may be interested to look in the "Last 5 Minutes" metrics if you catch the system just after  the problem has happened.

 

nnmhealth.ovpl -print verbose -filter StatePoller,SNMPHealthAgent

 

Hopefully, the metrics might show something if you examine them over time.   If you see extra activity at certain times,  then maybe it is worth checking for crontab or other activities that launch at the same time each day which could load the system in this way.   Check the nnm.#.#.log files at that time to see if anything else is outputting information around that time.

 

The only other options you have are tracing.   If it does happen within the same 10mins or so window every day,  then you could switch on state poller tracing for that period (nnmsetlogginglevel.ovpl com.hp.ov.nms.statepoller FINER). To switch it off again repeat the command with CONFIG instead of FINER.  

 

The problem is,  that the overhead of tracing, could actually make the problem worse as the system is loaded further outputting a lot of data to files.

 

Sorry, there is no easy answer for this type of problem.    It is probably better to start troubleshooting from 30,000 ft before trying to trace and analyse.   The fact that the problem occurs the same time each day is a big clue -  what has changed recently on your system or network or managment environment that could impact polling and SNMP or overall resource provision?

 

Regards,

Darren

 

ArcSight Support
If you find that this or any post resolves your issue, please be sure to mark it as an accepted solution.
0 Likes
Commodore Commodore
Commodore

Darren,

Thank you very much for your detailed description of what might be happening with these stale collection alarms.

 

I will definitely start from 30.000 feet, now that I have some idea of what to look for.

0 Likes
Cadet 2nd Class Cadet 2nd Class
Cadet 2nd Class

Hello RPD

 

The suggestion mentioned from Darran practically gives a complete explanation in why  and probably what  could be causing this issue, But  maybe one thing that you can do  is to check  your monitoring policies to see if someone changed this to  a lowest polling interval that is causing  stress to statepoller.

 

HP Support

The views expressed in my contributions are my own and do not necessarily reflect the views and strategy of HP

If you find this or any post resolves your issue, please be sure to mark it as an accepted solution.

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.