rolling logfile policy missing alerts
I have a logfile policy that watches a file which is rolled every day, that is, at midnight the current file is renamed with yesterdays timestamp and a new file of the original name is started. We seem to be missing alerts for entries that occur after the last poll and before the rename.
For example, with a 5 minute polling cycle the file is polled at 23:57, an error is written at 23:58, the new file starts at 00:00, the next poll is at 00:02. As the error now resides in the old file the alert is missed.
Other than reducing the polling to the minimum 5 seconds (this still leaves a 5 second window where potential alerts could be missed) is there something that can be done to avoid missing these log entries?
Also, what is the performance impact on the managed node of setting the log policy to poll every 5 seconds as opposed to something like 5 or 10 minutes.
Re: rolling logfile policy missing alerts
I have pondered on this scenario for quite a while as well.
I have not been able to completely rely on opcle picking up on matches in an appropriate period.
Periodically on occassions opcle can lose its pointer in memory and then re-parse the logfile back to the beginning.
There is quite a few Knowledge Documents surrounding the behaviour of opcle in relation to logfile monitoring.
The only way I can envision you being able to make sure these messages are captured. Is that you have a seperate script that parses messages that were logged in the last 5 minutes in the logfile and append them to the new file once rotated.
You may potentially get duplicates logged via pattern matching, although at least you know they are captured.
It is also worth noting that the default since agent 8.x opcle will read only 50 lines at a time.
You can increase this setting by: ovconfchg -ns eaagt -set OPC_LE_MAX_LINES_READ=THE_NUMBER (if you set this to '0' there is no limit set).
I would put a few safety measures in:
Here is some info on how the above can help:
"The Logfile Encapsulator (opcle) provides the 'Read from last file position' option, which ensures that logfile entries are parsed only once. After the logfile has been monitored, 'last file position' bookmarks (current file size and last line hash values) are saved into memory, to be used in the next monitoring cycle as the starting point.
There are available some configuration variables (OPC_LE_SAVE_STATE, OPC_LE_STATE_FILE) which make opcle to save those 'last file position' bookmarks on a file when the opcle is stopped, to be used when it is started again."
You may also have to use:
^ You may have to end up doing the above, if you are going to have a seperate script obtain the last entries in the file prior to rotation and appending to the new file. In this instance, you would want to read from the beginning rather than the end.
There is a knowledge document on this:
In relation to the 5 second polling time. I have tested this in a message storm scenario (over 110,000+ lines permitted in < 5 seconds. With over 10k matches. The CPU usage used by opcle was low (from 1% to 15% and this was even set with close after read - on a single core VM with only a few GB of RAM). The memory usage of opcle was also minimal anything from 3 MB to 8 MB. Interesting the more lines read the memory usage was less (went down to around 2.8 MB after processing)?
This was tested on the latest HP Operations Agent 11.13 with the latest consolidated hotfixes applied.
This was done with also routine commits done per every few seconds. The main bottleneck and CPU hogger was the message agent which caused the alerts that were matched to be delayed.
You can also perform an agent trace to analyse the performance surrounding each read. This is pretty minimal by todays standards. This will only be a potential issue on old legacy systems with a small amount of resources available.
Re: rolling logfile policy missing alerts
If you are really concerned about it, have 2 policies, one for even days of the year and one for odd days of the year ( you will need a small program to determine what the latest log files should be) but as they would only turn over every second day you would not loose any messages (apart from new years eve on non leap years).