Events delivery delay

Hi everyone.

I'm posting my because someone may have the same issue.

Right now we have the following topology:

firewalls (syslog) -> connector appliance -> Logger -> ESM

All our devices get their time from out ntp server, so the timezone mismatch is discarded.

When we create an active channel on the connectors we can see the events arriving to the ESM, the issue is that the are more or less 5 hours late. We tried to fix it with the time correction feature, but the correction worked for 1.30 minute.

After that 1.30 minute it began to get bigger exponentially (1.30, 3 m , 9 m an so on) right now it continue growing, right now we can't use dashboards o correlation rules because we don't have those events in real time.

We think that maybe the device is waiting more events to agregate them but it is taken to much time.

Do you think that may be the issue?

we have a ticket with support but we would like to hear more points of view or may be someone who has the same issue and fix it.

Thanks in advance to everyone for your help.

Regards.

Alfonso.

Parents
  • Verified Answer

    Glad you got a handle on it, at least in the meantime. I always hated dealing with such similar mysteries. I want to play with the content not troubleshoot connector issues, you know?

    You make no mention of EPS so its hard to offer any advise at all. I can share some of my own experiences though. I see you have a lot of event sources that will probably feed ConApps over syslog, so just watch out for dropped events. Theoretically a syslog connector can handle well over 1000 EPS, and in theory you can have at least 8 on a ConApp, one per container. However the reality is that all these parsers are CPU bound, and performance varies. Parsing Cisco events is notoriously slow, I can only get about 650 EPS from 5.0.3 connector on 6.0 ConApp with cisco events. At this point the container has taken up all of the CPU available to it, about 25%, the equivalent of 100% of the core its executing on. As the events keep coming in, another thread puts them into temporary files, waiting to be processed, however this cache isn't limitless either and once it fills up, you start dropping events. You'll see errors in agent.log to this effect, so make sure to periodically check it. You really can only have about 3 really busy containers on a 4-core ConApp box, since the OS itself and the other containers need some CPU time as well.

    I guess what I am saying is that I don't think you are in the clear yet. You have quite a few "chatty" event sources, and only time will tell. Just go slow and let the dust settle between adding devices to your deployment.

    Good luck!

Reply
  • Verified Answer

    Glad you got a handle on it, at least in the meantime. I always hated dealing with such similar mysteries. I want to play with the content not troubleshoot connector issues, you know?

    You make no mention of EPS so its hard to offer any advise at all. I can share some of my own experiences though. I see you have a lot of event sources that will probably feed ConApps over syslog, so just watch out for dropped events. Theoretically a syslog connector can handle well over 1000 EPS, and in theory you can have at least 8 on a ConApp, one per container. However the reality is that all these parsers are CPU bound, and performance varies. Parsing Cisco events is notoriously slow, I can only get about 650 EPS from 5.0.3 connector on 6.0 ConApp with cisco events. At this point the container has taken up all of the CPU available to it, about 25%, the equivalent of 100% of the core its executing on. As the events keep coming in, another thread puts them into temporary files, waiting to be processed, however this cache isn't limitless either and once it fills up, you start dropping events. You'll see errors in agent.log to this effect, so make sure to periodically check it. You really can only have about 3 really busy containers on a 4-core ConApp box, since the OS itself and the other containers need some CPU time as well.

    I guess what I am saying is that I don't think you are in the clear yet. You have quite a few "chatty" event sources, and only time will tell. Just go slow and let the dust settle between adding devices to your deployment.

    Good luck!

Children
No Data