We have 300 up connectors and 200 up servers. We are suffering resource(CPU/memory/Disk) arrangement issues in traditional system architecture. For example, one connector executed in one server or some connectors executed in one server. I'll like to know if any solutions can help me to balance the resources for each connector automatically, but not HA or active/standby architecture.In order to decrease the issues of resources are not enough to cause connectors abnormal down.Thanks.
A few comments on this:
1) Multi-layered connectors is a viable architecture - its a little complex when it comes to management, but it does work and we have a good number of customers who are using this approach. You have the bottom layer using the native log collection then forward up using CEF over syslog to the next layer and then from there into the ArcSight components - Logger, ESM etc. By breaking down the load, it does make it easier to manage.
2) Look to identify the bottlenecks and issues at the connectors - you shouldnt underestimate what optimization that can be done at the connector layer itself. Whilst you mention memory/CPU etc, there are other options for optimization too - such as efficiency in the parser, removal of unwanted parsers, making sure we arent over loading with mapping and aggregation etc. Additionally, multiple destinations at the connector is simple and easy, it is also an over head. Reducing this to just one and using the multi-layered approach is a good idea.
3) Consider the use of the Load balancer for connectors or the Event Broker. In the case of the load balancer, you can have standard data inbound (such as Syslog) and have this balanced over a number of connectors. Its not the most sophisticated method of doing this, but its free and included in the SmartConnector framework. It also makes sure there are active probes to the back-end connectors to spread the load dynamically. You can find the documentation here:
Also, the event broker is rapidly accelerating in its development cycle and probably cant answer all of your issues now, but it will likely solve a few in the very near future. There is an impending release at the end of March / start of April that will allow for a lot more capability, but importantly its something that you can send to from all connectors and then have this as the distribution point to Logger, ESM and so on (ESM must be updated to the latest release at the end of March / start of April to do this also). Therefore you can use just one destination from the connectors to the event broker and then let event broker distribute to Logger, ESM and anything else that you might have - this is a big efficiency gain and importantly it soaks up peak loads on the ingest side, making it a lot more efficient.
4) Dig into the event loads themselves - be brutal on what you are ingesting. Its something that needs constant review and analysis. Its too easy to ingest data that doesn't support the goal of what you are trying to solve. For example, if you are feeding data to ESM for correlation purposes, thats great, but only if it is going to get used for a rule. If not, don't do it. Reducing the load at the inbound side of a connector is simple, easy and efficient - and adds efficiency across the whole infrastructure.
Hope this helps.
Thanks for your suggestion. Could you please give me more detail information or related documents about Multi-layered architecture. For example, H/W requirement, configuration settings of connectors and system layout diagram. I would like to study whether it suitable for our environment or not. Thanks a lot.
I don't really have any architecture diagrams I am afraid, mainly because its unique per customer instance.
However, the model is pretty simple and straightforward though. This means that the HW requirement and OS setup is the same as normal connectors. Since you are potentially increasing the EPS rates as you go from devices to the first and then second layer of the architecture, the requirements do go up - so be more focused on the HW on the second layer though. Since we are forwarding through CEF data, this is high speed anyway, its overheads are lower and hence not too much of an issue though.
So consider the first layer as normal HW / SW requirements. The second layer is more important, but typically its the same as the first line, but should be tested before locking this (it varies by customer scenario).
Can you explain a little more on the environment and what you are trying to do? For example, locations, type of log source and EPS rates? We can give you some recommendations from there.
After a discussion with your comments, we changed our priority to start to analyze the content efficiency first. We try to analyze the log usage rate in ESM. For example, how many times does one column or one view be read by rules per week, which connector's input includes the most important information for security analysis. Base on this analysis, we will start to optimize current system architecture for the stable quality of high priority logs. For this purpose, would you know any reports or dashboards can help us to speed up this analysis in ESM. Thanks.