Events delivery delay

Hi everyone.

I'm posting my because someone may have the same issue.

Right now we have the following topology:

firewalls (syslog) -> connector appliance -> Logger -> ESM

All our devices get their time from out ntp server, so the timezone mismatch is discarded.

When we create an active channel on the connectors we can see the events arriving to the ESM, the issue is that the are more or less 5 hours late. We tried to fix it with the time correction feature, but the correction worked for 1.30 minute.

After that 1.30 minute it began to get bigger exponentially (1.30, 3 m , 9 m an so on) right now it continue growing, right now we can't use dashboards o correlation rules because we don't have those events in real time.

We think that maybe the device is waiting more events to agregate them but it is taken to much time.

Do you think that may be the issue?

we have a ticket with support but we would like to hear more points of view or may be someone who has the same issue and fix it.

Thanks in advance to everyone for your help.

Regards.

Alfonso.

  • Alfonso,

    Since your ESM gets the events from Logger you have to start your investigation there. Turn off time correction, since i believe it's just confusing the issue and open up a channel in ESM. Compare Manager Receipt time (when ESM received the event) with Agent Receipt Time (for events forwarded on by Logger this'll be when Logger received the event) with Device Receipt Time/End Time (usually the time the event happened).  If the Manager Receipt and the Agent Receipt times are more or less in line, then the problem occurs somewhere earlier in the flow and you have to go to Logger to investigate. If you find that there is a discrepancy between the Manager and the Agent receipt times (more than a few minutes), then it's likely that the forwarders on your logger are having problems keeping up with the flow.  Depending on the version of Logger you are running, there is a bug in 4.0 that results in lower EPS to the manager than the logger is truly capable of, see this KB article:https://arcsight.custhelp.com/cgi-bin/arcsight.cfg/php/enduser/std_adp.php?p_faqid=3454 (KB3454)

  • Thanks it seems that I have that issue, I will update the box and let you know.

    Have a great day.

    Saludos.

    Ing. Alfonso Alejandro Reyes Jiménez

    Analista del sector Gobierno

    E-mail: aareyes@scitum.com.mx <mailto:aareyes@scitum.com.mx>

    Telefono: 91 50 74 00 ext. 7489

    Movil: (044) 55 52 98 34 82

    La información contenida en el presente correo es confidencial y para uso exclusivo de la persona o institución a que se refiere. Si usted no es el receptor deliberado es ilegal cualquier distribución, divulgación, reproducción, completa o parcial, aprovechamiento, uso o cualquier otra acción relativa a ella. Por favor notifique al emisor e inmediatamente bórrela de forma permanente de cualquier computadora en la que resida y en caso de existir, destruya cualquier copia impresa.

    De: Gary Portnoy

    Enviado el: martes, 12 de octubre de 2010 12:30 p.m.

    Para: Alfonso Alejandro Reyes Jimenez

    Asunto: Re: - Events delivery delay

    Protect 724 <protect724.arcsight.com/index.jspa>

    Events delivery delay

    reply from Gary Portnoy <protect724.arcsight.com/.../gportnoy> in Connectors - View the full discussion <protect724.arcsight.com/.../14910

  • Hi Gary.

    We updated the logger, the results were the same.

    It seems that we are trying to send more events that the logger can handle, the wierd thing is that the connector appliance has no issues with them. If we point directly to the ESM everything arrives in real time, I was wondering if the forwarding connector has some kind of events limitations.

    May be there's a way to balance de delivery to the ESM, have you seen that?

    We have checked everything, the speed and duplex of the box, the connector appliance (we upgraded to version 6.0 and version 5.x on the connectors).

    We don't know what else to do, we are just waiting for the supports answer.

    Right now we don't have all the devices sending logs and we disable some logs on the only firewall that sends events to the logger.

    The logger can handle more events than the ESM right?

    Regards.

    Alfonso.

  • Alfonso,

    Loggers do have a limit on EPS, check the documentation, because it depends on the model. You never mentioned how many EPS your ConApps are sending so its hard to make a guess as to what can be the problem, or if there even is one. I know you can set up multiple forwarders to the same ESM, just make sure their filters are mutually exclusive, maybe that'll help?

    If you go to the system monitoring screen and look at the throughput from your forwarder strictly on a network level, is there a pretty standard ceiling that you are hitting? If you find that you are maxing out around 2 MBps, for example, get a support code and log on through SSH and try to SCP (SSH copy) a large file from there to your ESM to see what throughput SCP gets. If its the same as what you are seeing on your logger GUI, then its the network itself. BTW, are all of these components (ESM, ConApp, Logger) in the same datacenter?

    Let us know what you find out.

  • Hi Gary.

    Yes actually we have a Logger L7200x which according to the documentation doesn't have restrictions (In general events), but my question was more focus to the events that the forwarding connector can handle.

    Answering you question, all the devices are on the same datacenter and subnet, all of them work on a gigabit ethernet link. Just to be sure that it wasn't a manager issue we use the iptraf software to check the network traffic and the results wasn't even close to 5 mb/s at the rush our.

    Right now we upgraded the devices (2 loggers) to the latest free version (4.5 GA) and we disable some logs (irrelevant) on the firewalls. That seems to fix our issue, right now we have a 4 sec. delay which doesn't harm anybody .

    Tomorrow we are going to update the connector appliance to the latest version (6.0) and we will test the latest (5.0) version of the connectors. We are still trying to get the best times because we are missing another 9 Cisco ASA firewalls (almost 70, 000 users), 2 Cisco Ironports (Mail), 3 Cisco Ironports (Web), 4 Cisco IPS and 4 Cisco Nac appliance (2 CAS and 2 CAMs). We need to integrate everyone to the ESM.

    According with the presales team, the devices (1 ESM, 2 loggers and 2 connector appliance) can handle all of them.

    What do you think?

    Anyway I will let you know any update, again thank you very much for your help.

    Regards.

    Alfonso.

  • Hi Gary.

    Yes actually we have a Logger L7200x which according to the documentation doesn't have restrictions (In general events), but my question was more focus to the events that the forwarding connector can handle.

    Answering you question, all the devices are on the same datacenter and subnet, all of them work on a gigabit ethernet link. Just to be sure that it wasn't a manager issue we use the iptraf software to check the network traffic and the results wasn't even close to 5 mb/s at the rush our.

    Right now we upgraded the devices (2 loggers) to the latest free version (4.5 GA) and we disable some logs (irrelevant) on the firewalls. That seems to fix our issue, right now we have a 4 sec. delay which doesn't harm anybody .

    Tomorrow we are going to update the connector appliance to the latest version (6.0) and we will test the latest (5.0) version of the connectors. We are still trying to get the best times because we are missing another 9 Cisco ASA firewalls (almost 70, 000 users), 2 Cisco Ironports (Mail), 3 Cisco Ironports (Web), 4 Cisco IPS and 4 Cisco Nac appliance (2 CAS and 2 CAMs). We need to integrate everyone to the ESM.

    According with the presales team, the devices (1 ESM, 2 loggers and 2 connector appliance) can handle all of them.

    What do you think?

    Anyway I will let you know any update, again thank you very much for your help.

    Regards.

    Alfonso.

  • Hi Gary.

    Yes actually we have a Logger L7200x which according to the documentation doesn't have restrictions (In general events), but my question was more focus to the events that the forwarding connector can handle.

    Answering you question, all the devices are on the same datacenter and subnet, all of them work on a gigabit ethernet link. Just to be sure that it wasn't a manager issue we use the iptraf software to check the network traffic and the results wasn't even close to 5 mb/s at the rush our.

    Right now we upgraded the devices (2 loggers) to the latest free version (4.5 GA) and we disable some logs (irrelevant) on the firewalls. That seems to fix our issue, right now we have a 4 sec. delay which doesn't harm anybody .

    Tomorrow we are going to update the connector appliance to the latest version (6.0) and we will test the latest (5.0) version of the connectors. We are still trying to get the best times because we are missing another 9 Cisco ASA firewalls (almost 70, 000 users), 2 Cisco Ironports (Mail), 3 Cisco Ironports (Web), 4 Cisco IPS and 4 Cisco Nac appliance (2 CAS and 2 CAMs). We need to integrate everyone to the ESM.

    According with the presales team, the devices (1 ESM, 2 loggers and 2 connector appliance) can handle all of them.

    What do you think?

    Anyway I will let you know any update, again thank you very much for your help.

    Regards.

    Alfonso.

  • Verified Answer

    Glad you got a handle on it, at least in the meantime. I always hated dealing with such similar mysteries. I want to play with the content not troubleshoot connector issues, you know?

    You make no mention of EPS so its hard to offer any advise at all. I can share some of my own experiences though. I see you have a lot of event sources that will probably feed ConApps over syslog, so just watch out for dropped events. Theoretically a syslog connector can handle well over 1000 EPS, and in theory you can have at least 8 on a ConApp, one per container. However the reality is that all these parsers are CPU bound, and performance varies. Parsing Cisco events is notoriously slow, I can only get about 650 EPS from 5.0.3 connector on 6.0 ConApp with cisco events. At this point the container has taken up all of the CPU available to it, about 25%, the equivalent of 100% of the core its executing on. As the events keep coming in, another thread puts them into temporary files, waiting to be processed, however this cache isn't limitless either and once it fills up, you start dropping events. You'll see errors in agent.log to this effect, so make sure to periodically check it. You really can only have about 3 really busy containers on a 4-core ConApp box, since the OS itself and the other containers need some CPU time as well.

    I guess what I am saying is that I don't think you are in the clear yet. You have quite a few "chatty" event sources, and only time will tell. Just go slow and let the dust settle between adding devices to your deployment.

    Good luck!

  • Thanks for the good wishes, answering your question we are sending around 10,000 eps with one connector appliance. The weird thing was that the connector appliance wasn't delaying the events, when we send the events to the esm they arrive on time.

    Right now we are going to change the design, we will leave one connector appliance and one logger for all the firewalls. The other devices will be on the other devices.

    And we are working to get just the relevant events of the firewalls, that should help us with the delay issues.

    Any way the troubleshoot of this kind of devices is very interesting, but (like you) I prefer to generate correlation rules, views and stuff.

    I will let you know any update.

    Thanks.

  • Hi,

    Can you check the following details OriginalAgentreceipttime,Agentreceipttime,managerreceipttime,devicereceipttime and let us know.

    Regards,

    Vivek