jakubmichl Absent Member.
Absent Member.
1513 views

BufferOverLimit

Hi all,
We are using Sentinel version 8.1.1.0_4309 for a few months and we have discovered some new Sentinel events, which we have not seen on version 7.4.1.
1.
evt:"BufferOverLimit" Message: RT-Event-Queue-Active Views: Dropped 246324240000 event(s) since Tue Sep 18 15:53:11 CEST 2018.Total events dropped 248310560000.

The event has severity 1 and the message is same for RT-Event-Queue-Correlation and RT-Event-Queue-Security Intelligence. We had network issue between 15:53 and 16:03 when these events were generated so we assume that this is a new format of log messages that used to say that events are out of 30s window. We have not seen event "Failed To Correlate" that is mentioned in the 8.1.1 Release Notes so we are not sure of that.
Our queue for these events according to Storage health tab is set by default to 20,000,000. The number of events dropped is surely not real, we really do not have 410,540,400 EPS.

2. Same type of event as in the point 1, but for RawDataStorage:
evt:"BufferOverLimit" Message: RawDataStore-eventQueue: Dropped 96167 event(s) since Wed Sep 19 09:00:01 CEST 2018.Total events dropped 3910136.
This event is randomly generated through a day. Our queue for RawData Store is set by default to 42,949,672,940,000 according to Storage health tab.

For both occurrences we do not see weird records in log files.
Thanks for your help with the investigation.
0 Likes
6 Replies
davidkrotil Super Contributor.
Super Contributor.

Re: BufferOverLimit

Obvious question, have you tried upgrade to 8.2 version ? If you have current maintenance I would do that as first troubleshooting step.
0 Likes
jarivaahtera Absent Member.
Absent Member.

Re: BufferOverLimit

We have had these BufferOverLimit events when the system can't handle the amount of EPS coming to the system. In that case we have had a runaway event source that generates thousands of events per second for no reason. The events that warn you about correlation are still there in the 8.1 and 8.2 versions. This line is from 8.2 version:

Mon Oct 01 15:30:57 EEST 2018|SEVERE|TimerThreadPool pool|esecurity.ccs.comp.correlation.TreeCorrelationBuffer$EventDroppedErrorReporter.report
In the previous 900,004ms, 3 events from collector Microsoft DHCP (35B22AF0-C881-1035-9EA8-005056B365B7) were not correlated as time difference was greater than delay of 30,000 ms


Also today as i enabled the event visualization we had these messages in server0.0.log

Mon Oct 01 14:27:38 EEST 2018|SEVERE|pool-22-thread-1|esecurity.ccs.comp.event.visualization.EventVisualizationProcessor$3.onFailure
Failed to forward 5,000 events in this batch, copying back to persist queue
java.util.concurrent.TimeoutException


I think that those messages come also because the system can't handle the amount of logs pushed in. I could not find any info about how much EPS that event visualization can handle.
0 Likes
ScorpionSting Absent Member.
Absent Member.

Re: BufferOverLimit

jarivaahtera;2488238 wrote:
We have had these BufferOverLimit events when the system can't handle the amount of EPS coming to the system. In that case we have had a runaway event source that generates thousands of events per second for no reason. The events that warn you about correlation are still there in the 8.1 and 8.2 versions. This line is from 8.2 version:

Mon Oct 01 15:30:57 EEST 2018|SEVERE|TimerThreadPool pool|esecurity.ccs.comp.correlation.TreeCorrelationBuffer$EventDroppedErrorReporter.report
In the previous 900,004ms, 3 events from collector Microsoft DHCP (35B22AF0-C881-1035-9EA8-005056B365B7) were not correlated as time difference was greater than delay of 30,000 ms


Also today as i enabled the event visualization we had these messages in server0.0.log

Mon Oct 01 14:27:38 EEST 2018|SEVERE|pool-22-thread-1|esecurity.ccs.comp.event.visualization.EventVisualizationProcessor$3.onFailure
Failed to forward 5,000 events in this batch, copying back to persist queue
java.util.concurrent.TimeoutException


I think that those messages come also because the system can't handle the amount of logs pushed in. I could not find any info about how much EPS that event visualization can handle.


You'll find your disk space quickly disappearing...the "persist queue" is stored in the relevant directories under /var/opt/novell/sentinel/data/buffers/ (off top of head).... Issues I had with elasticsearch crashing, resulted in this volume becoming 100% full and it just exacerbated the problem.

I couldn't find anyway of tuning the allocation to the queue engines (not that I overly tried), but I suspect it would be dynamic based on hardware (mem, etc)....

Might need to look at fine tuning your event sources, see if you can restrict what you are collecting to the information you actually need to monitor and alert on.

Visit my Website for links to Cool Solution articles.
0 Likes
rochfo Super Contributor.
Super Contributor.

Re: BufferOverLimit

ScorpionSting;2488273 wrote:
You'll find your disk space quickly disappearing...the "persist queue" is stored in the relevant directories under /var/opt/novell/sentinel/data/buffers/ (off top of head).... Issues I had with elasticsearch crashing, resulted in this volume becoming 100% full and it just exacerbated the problem.

I couldn't find anyway of tuning the allocation to the queue engines (not that I overly tried), but I suspect it would be dynamic based on hardware (mem, etc)....

Might need to look at fine tuning your event sources, see if you can restrict what you are collecting to the information you actually need to monitor and alert on.


I had this same error when I set the eventvisualization.traditionalstorage.enabled to true in configuration.properties on the primary Sentinel Server. The collector managers EventVisualizatoinRouting directory had a huge amount of files approx 20mb-30mb in size and system memory was maxed out. Setting this back to false removed these files, solved the memory issue and the BufferOverLimit errors went away.

My systems aren't spec'd or set-up correctly to run EventVisualization.
ScorpionSting Absent Member.
Absent Member.

Re: BufferOverLimit

rochfordp;2496646 wrote:
I had this same error when I set the eventvisualization.traditionalstorage.enabled to true in configuration.properties on the primary Sentinel Server. The collector managers EventVisualizatoinRouting directory had a huge amount of files approx 20mb-30mb in size and system memory was maxed out. Setting this back to false removed these files, solved the memory issue and the BufferOverLimit errors went away.

My systems aren't spec'd or set-up correctly to run EventVisualization.


Thanks for that....might see what I can do around it on mine.

Visit my Website for links to Cool Solution articles.
0 Likes
brandon-langley Absent Member.
Absent Member.

Re: BufferOverLimit

jakubmichl;2487766 wrote:
Hi all,
We are using Sentinel version 8.1.1.0_4309 for a few months and we have discovered some new Sentinel events, which we have not seen on version 7.4.1.
1.
evt:"BufferOverLimit" Message: RT-Event-Queue-Active Views: Dropped 246324240000 event(s) since Tue Sep 18 15:53:11 CEST 2018.Total events dropped 248310560000.

The event has severity 1 and the message is same for RT-Event-Queue-Correlation and RT-Event-Queue-Security Intelligence. We had network issue between 15:53 and 16:03 when these events were generated so we assume that this is a new format of log messages that used to say that events are out of 30s window. We have not seen event "Failed To Correlate" that is mentioned in the 8.1.1 Release Notes so we are not sure of that.
Our queue for these events according to Storage health tab is set by default to 20,000,000. The number of events dropped is surely not real, we really do not have 410,540,400 EPS.

2. Same type of event as in the point 1, but for RawDataStorage:
evt:"BufferOverLimit" Message: RawDataStore-eventQueue: Dropped 96167 event(s) since Wed Sep 19 09:00:01 CEST 2018.Total events dropped 3910136.
This event is randomly generated through a day. Our queue for RawData Store is set by default to 42,949,672,940,000 according to Storage health tab.

For both occurrences we do not see weird records in log files.
Thanks for your help with the investigation.


If you're having an issue with rawdata writing, that's a pretty strong indicator your disk subsystem is potentially not keeping up with the EPS. Seeing it on SI and other areas tends to confirm that notion. Best analogy is it sounds like you had the system carrying a heavy safe over it's head, and then you tossed another safe on top of that.

System CPU utilization and memory consumption will tell you if your issue has anything to do with pure processing, but rawdata is relatively lightweight processing. IOSTAT + looking at the await value will tell you if you have long delays doing I/O (lower number = good, higher number = bad).

CPU and memory are one of the biggest governors of performance, but if your disks do not sufficiently perform to the spec we document or better, you will not be able to handle much EPS at all.
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.