Moving Average datamonitors stops consolidating events?
Hello, I've got some troubles with Moving Average data monitors. Basically, they seem to randomly stop consolidating events, although I know that the events they should consolidate are still arriving. I know that events are still arriving because: - Creating an active channel with the data monitor filter gives thousand of events - Restarting the datamonitor makes it work again. I attached some pics illustrating the prob: Pic1: Active channel with data monitor filter - Thousand of events with no timestamp problem Pic2: Dead silent data monitor Pic3: Data Monitor restart - It is working again. Anyone experience this issue ? Rgds
Two possible explanations for this are: 1. You have a DM threshold setting to drop the display group if the counts and/or statistics drop below a certain value, which you sometimes hit. 2. (A bit related to #1) Your agent sending the events has a significantly different (or different enough) time from your manager (not counting timezones). The DM sets up the buckets (number, size, start-end times) based on the manager's time. It will populate the events in the bucket based on the end time of the event. To see if this might be your problem, you can set the active channel to show both end time and manager receipt time. These times should be almost identical. If they're off by some number of minutes, see how this number of minutes difference compares to the buckets of the DM. For example, with #2, if you had 5 one-minute buckets on the manager (going back 5 minutes), but your end time was 10 minutes behind the manager, nothing would be added to any of the buckets. Sometimes the difference is right on the edge, so things seem to come and go a bit randomly. This is best fixed by standarding the times of the machines sending events to the manager (as well as the manager's time, too). -- Dave
David, Thanks for the answer. However, your remarks are not true in my case: 1. All data monitors have a Group Discard Treshold of 0 2. As you can see in the screenshots I attached, there is no lag and no inconsitencies between the end/agent/manager times. In the active channel I attached, the end time and Manager time are equal. This active channel has the same filter as the datamonitor. Furthermore, the data monitor works perfeclty after I restart it (As seen on the screenshots attached in the 1st post) Rgds, Frederic
Good point. I'd missed that you had both end time and manager receipt time in the channel. As you said, they look reasonable. I guess this moves on to more questions... 1. Which version of ESM is this? 2. What filter are you using on the DM? 3. Have you noticed any exceptions in the console or server logs? 4. How long do you have to run before this problem shows up? 5. By "restart", you mean you just disable and re-enable the DM? 6. What event rate (roughly) do you run (both raw and into this DM)? 7. Have you seen this problem with any other moving average DMs? I don't recall having heard of this problem before, and we'd probably have to try to reproduce it with a test alert agent feeding events into a manager. -- Dave
I have a problem which could be related : When displaying traffic from several devices with a moving average DM, when a device stops to send traffic for the DM time period, the device is removed from the dashboard which is normal. However, when a removed device sends events again after its inactivity period, it does not reappear in the dashboard. After a restart, everything is working fine again. I noticed this behavior several times with several DM and opened a ticket regarding this issue.
David, Below are the answer to your questions: 1) 4.0.1 SP3 2) The filter matches a device address and "Base Events" 3) As far as I looked, I did not see any exception regarding DMs 4) I don't know exactly but this is random and can take days up to weeks before it stops working 5) Correct, Disable/Enable 6) The DM shown in example has about 40000 events/5min (Sample period) thus abt 133 evt/sec. The flow never decrease significantly 7) Actually it occurs for *all* the moving average / statistics data monitors I created. When I took the screenshots, all the DMs under a folder were no longer working (about 50 DMs) and I had to restart them all. Below are attached "before restart" and "after restart" pictures of a dashboard whose DMs were all silent before the restart, and working again after the restart (excepted one but this was normal at that time) [QUOTE=GCA]I have a problem which could be related :[/QUOTE] That might be an interesting point to look at. Fred
OK, nothing in your setup seems out of the ordinary. Did these ever work for you? If they did, do you remember what changed? Was it an ESM upgrade or something? As I mentioned before, I'm not aware of any problem like this, and this sounds pretty severe. If you haven't already, I would ask you to open a support call on this. We'll have to try to reproduce it in-house and find / fix the problem. -- Dave
David, I remember I got troubles with those moving average DMs since we installed 4.0 ESMs. I saw the occuring when I began creating Moving average DMs when we were in the 3.5 -> 4.0 process, but did not really pay attention to the problem. I realised this might be a severe issue when I saw this whole set of 50 DMs were not working anymore and worked again after the re-enable. I created a support ticket, #080623-000021. Thanks for cooperation Fred
David, I am getting same behavior, but it does not return to work after DISABLE/ENABLE DM.
By the way, I am using the "Event Throughput" built-in Moving Average DM.