Guest post by Chas Clawson – Senior ArcSight Engineer at MicroFocus Government Solutions
Over the last few years, we’ve witnessed the massive growth of data being generated by interconnected IT systems. You may have heard the claim that 90% of all the world’s data has been generated in the last three years. That exponential trend shows no signs of slowing. With all of this data and metadata, the industry was faced with two challenges:
- How and where to store all this information.
- How to efficiently distribute and transmit it between systems that need to utilize it.
These challenges hold true within all areas of an organization's IT infrastructure, but they are critically true within security operations.
Fortunately, some very innovative advancements have evolved that helped solve these pain points. MicroFocus ArcSight has been leading the way with their Apache Kafka based Event Broker and Data Platform solutions. Event Broker allows customers to scale their event ingestion rate like never before (up to 1 million events per second). These SecOps solutions now also allow this data to be consumed by any other 3rd party tool, including big data and analytic solutions such as Elastic, Hadoop or even Splunk. Although we have been clamoring for this scalable, high-throughput, clusterable system for event publishing for some time, as the cliché goes, we should be careful what we wish for!
Now we are faced with the next pain point - gleaning intelligence from the massive amount of available data while not drowning in the “data lake”. The only way to extract intelligence from data is via a central processing unit (CPU) and memory intensive analytics and correlation. Arriving in style to the big data party, MicroFocus is now introducing distributed correlation to its ArcSight ESM platform. Before we examine how this works, it’s necessary to rewind a little bit and talk about the two approaches used when solving data processing limitations.
Scaling “up” or “out”
Typically, architects are faced with the choice of “scaling up” or “scaling out”. Scaling up is the big iron approach of building bigger and badder custom machines with insane hardware specs that geeks like to brag about when not MMO gaming (massively multiplayer online gaming, for the uninitiated). This works for some applications, but cost is a huge factor and frequent “fork-lift” upgrades are cost prohibitive. Pioneers like Google and Facebook took a different approach by “scaling out”. This approach spreads the load across clustered systems, often built with commodity hardware, which also comes with the added benefit of improved fault tolerance and availability that increases as the cluster grows. So what does that have to do with a security information and event management (SIEM) application and its correlation engine? A well-known software engineer once said “All problems in computer science can be solved by another level of indirection." Simplistically, this translates to very modular software designs with different functions interfacing to solve a more complex problem. If done right, this allows for core software components to be decoupled and in some cases distributed across hardware systems-- like a colony of unstoppable army ants… or a rampaging herd of elephants.
Under the SIEM hood
Under the hood, ArcSight ESM has always had distinct services processing and analyzing events in real time. The event flow goes: Connector (Event collector) → Peristor (CORRE Database) → Correlator Service (Rule filters) → Aggregator Service (Time windows and event match counts)=Correlation Event (Intelligence). Until now this was all nicely packaged up into what we called the “Manager.”
The manager was like an octopus juggling everything from rules & data monitors, to database and active list read/writes. As any seasoned SIEM engineer can attest, any one of these services can bring a manager to its knees. CPU and memory exhaustion from poorly written rules or bloated active lists are unfortunately too common. The fix? “Exploding” these manager services across clustered hosts! We will soon be able to run multiple instances of correlator and aggregator services across many hosts, while keeping the manager (persistor) focused on what it does best -- writing and retrieving events from the CORRE database.
Inherently, Correlators are CPU intensive, Aggregators are memory intensive and the Persistor is disk I/O intensive. Bottom line -- with distributed correlation, we now have the ability to scale out our SEIM in countless ways to meet the most demanding needs and complex use cases. Of course, compact all-in-one manager instances will still be an option.
Throwing more data at the correlation engine
For most organizations, even with the advances in network throughput, more robust storage options and more powerful processing hardware, they find themselves constantly evaluating the cost-benefit of event ingestion into their centralized SIEM and analytic tools. With distributed correlation, we now have the ability to throw much more data at the correlation engine, which in turn, “bubbles up” the events of interest (EOI). Events that previously may have been too much to handle, such as end point logs, threat intelligence matches, DNS logs or net flows can now be used in the correlation logic providing more contextual data around EOI and improving the fidelity of alert rules. With different storage retention policies, these correlation events can be retained longer than the base events themselves, thus letting ESM and its correlation engine act as a way to separate the wheat from the chaff, while adding security context to raw data in real time making it instantly usable for analysis. By design, this pairs synergistically with ArcSight’s new Event Broker which, as mentioned previously, is able to scale up to provide ESM with events per second (eps) not previously achievable.
So in summary, what does distributed correlation provide?
- Improved correlation fidelity with more contextual event analysis
- More efficient use of resources as ESM dynamically identifies EOI
- Improvements to ESM availability and redundancy
- Better cost/performance flexibility
- Flexible expansion and capacity planning options
- Backwards compatibility with existing rules & content
- Ability to get more value from existing security tools and events
At the center an intelligent SOC, is the ability to efficiently extract intelligence from your data. Distributed correlation will be a powerful new way to scale out your SIEMs analytics and event correlation engine in a cost effective way.
Stay tuned for more blog posts about innovative ArcSight SecOps solutions. For more information on Event Broker, the exciting open architecture message bus built on Apache Kafka, see the new ArcSight Data Platform here or reference the whitepapers below.
Chas Clawson is a Senior Architect for Micro Focus government solutions. He was formerly a civilian with the NSA Red Team and has also been a SIEM architect supporting fortune 500 companies with an MSSP now known as Optiv..