SmartConnector Queue drops

Hi All,

We've been monitoring the ArcMC and noticed that most of our Connectors have some level of Queue drops shown in ArcMC. While, this was unnoticed for a long while, once taken into consideration we've been monitoring it very closely and the drops are quite significant on most connectors. I'm not certain if this is related to the volume of events, types of events, parsing related performance of the connector or in general itself. We aren't able to deduce the RCA so far and its been a month of investigation even with the support team involved but no breakthrough so far. Hoping for any known issues or suggested resolutions from fellow members here.

To give a brief of the overall issue,

We've a setup where few syslog connectors (UDP) collecting logs from 3 sources basically have huge EPS and we've seen the issues begin from here:

Connector1 - collecting Cisco ASA logs - receiving avg. EPS of around 15-18k - constant Connector queue drops

Connector2 - collecting Fortigate logs - receiving avg. EPS around 13-15k - constant Connector queue drops 

Connector3 - collecting Bluecoat logs - receiving avg. 10k EPS - constant Connector queue drops

We were certain the high volumes are causing the events to drop from the queue before it reaches the processing cycle. So we deployed ArcSight Load Balancer and split the traffic between different connectors.

Cisco ASA - 6 Connectors - 2-3k avg. EPS per connector - some of the connectors are functioning properly while around 4 are dropping from the queue.

Fortigate - 5 connectors - 2-3k avg. EPS per connector - all connectors dropping from queue.

With the above scenario, all the connectors have similar configurations across them - 4GB of JVM, linux hosted, syslog UDP. Additional configurations - ASA connectors have map files added; Fortigate and ASA connectors have aggregation applied. 

Even without the aggregation applied, the connectors are dropping from the queue, so this is not related to the aggregation/filtration applied. We've ruled out EPS related concern as well as the connectors are processing around 2-3k EPS which shouldn't be an issue at all. Parsing wise while there are no issues, the concern is of events being dropped from the queue.

Suggested actions from the support:

-adding few parameters to the agent.properties such as multithreading, increasing queue size, increasing batch sizes, etc. - applied all but issue exists

-disabling the queue on the connector - rejected as its not an acceptable solution.

-upgrading the connector to latest 8.1 Framework as it has significant performance enhancements - done on one of the connector to confirm, but still issue persists.

 

Attaching snips for the patterns observed from the queue drops.

 

Any helpful pointers to resolve the issue of connector drops are welcome.

Thanks.

 

Best Regards,

Zulfikhar Naiyar

 

Parents
  • Hi

    i mean, the suggestions from support, should do a lot of th tricks...

    your issue:

    - Connector1 - collecting Cisco ASA logs - receiving avg. EPS of around 15-18k - constant Connector queue drops
    - Connector2 - collecting Fortigate logs - receiving avg. EPS around 13-15k - constant Connector queue drops 
    - Connector3 - collecting Bluecoat logs - receiving avg. 10k EPS - constant Connector queue drops

    supports suggestions:

    - adding few parameters to the agent.properties such as multithreading, increasing queue size, increasing batch sizes, etc. - applied all but issue exists

    what I would suggest to you, add to agent.properties

    syslog.parser.multithreading.enabled=true
    # 2 or 4 threat per processor count
    syslog.parser.threadsperprocessor=2
    syslog.parser.threadcount=-1



    Can you share a bit mor details on you underling hardware?
    are you running all "3" Smart Connectors on the same VM/HW?

    SmartConnector is quite I/O intense, and if you are running all three connectors one one VM... this might be your bottleneck.

    18K eps... for a vm is close to impossibel to handle... at least if the HyperVisor is used by other VMs. on a VM I never add more then 1k-1.5k EPS per VM... 

    If it is hardware... let say it like that, i have seen SC running on >45k EPS, but it was very beefy HW.

    Hope it will help you to investigate into the right direction... Feel free to come back

    KR

Reply
  • Hi

    i mean, the suggestions from support, should do a lot of th tricks...

    your issue:

    - Connector1 - collecting Cisco ASA logs - receiving avg. EPS of around 15-18k - constant Connector queue drops
    - Connector2 - collecting Fortigate logs - receiving avg. EPS around 13-15k - constant Connector queue drops 
    - Connector3 - collecting Bluecoat logs - receiving avg. 10k EPS - constant Connector queue drops

    supports suggestions:

    - adding few parameters to the agent.properties such as multithreading, increasing queue size, increasing batch sizes, etc. - applied all but issue exists

    what I would suggest to you, add to agent.properties

    syslog.parser.multithreading.enabled=true
    # 2 or 4 threat per processor count
    syslog.parser.threadsperprocessor=2
    syslog.parser.threadcount=-1



    Can you share a bit mor details on you underling hardware?
    are you running all "3" Smart Connectors on the same VM/HW?

    SmartConnector is quite I/O intense, and if you are running all three connectors one one VM... this might be your bottleneck.

    18K eps... for a vm is close to impossibel to handle... at least if the HyperVisor is used by other VMs. on a VM I never add more then 1k-1.5k EPS per VM... 

    If it is hardware... let say it like that, i have seen SC running on >45k EPS, but it was very beefy HW.

    Hope it will help you to investigate into the right direction... Feel free to come back

    KR

Children
No Data