hyperthreading and performance
I am configuring a new set of hardware for 6.5 and later upgraded to 6.8. Both 6.5 and 6.8's install guides have guidance on the use of Hyperthreading stating simply - "Disable this". HP also has a hardware whitepaper document stating "adds logical cores but increases computational jitter" in regard to BIOS settings for low-latency Linux environments. I have heard a couple stories of large improvements in performance as measured from EPS as well.
I am not entirely certain why this might be done and was wondering if anybody might have a technical reason why this might cause performance issues. Does anybody know how this might be experimentally tested?
Please let me know and thanks.
It's done because the default scheduler for Red Hat (CFS) isn't optimized for SMT. As an alternative to disabling HT you could set CPU affinities to avoid the virtual CPU's from being used by ArcSight. For a non-IO bound thread, a virtual CPU core (i.e. Hyperthread) has about 20% of the performance of a real CPU core.
As far as testing... that's what Bleep is for.
Yep, thats right!
Hyper threading cripples performance based on the internal performance testing we have done. It works, but you hit an upper limit pretty quickly and its far from efficient. Disable it, suddenly the upper limit is much much higher. When we work on the really big systems, the effect is even more dramatic!
Its an odd one and technically it should improve things, but it doesnt. From some internal discussions I have had a mixture of answers. The best being the one here! But I have also heard about conflicts in OS, JVM and ESM itself - but don't quite believe that myself.
Big note though - we are IO bound to a degree. Clearly more data going in, more gets written to the DB. But, if you take that off the table (such as with the use of SSD storage where IOPS can be and should be massive) then we see a much better set of performance criteria and more consistency in the overall performance (which is what we want). I always recommend going for high IOPS to "take it off the table", but memory is the next biggest thing to consider.
That makes sense. In regards to bleep, I'm afraid that I have never used it. I looked around a little and found some items and it looks like it will do everything we might need. The admin guide made the interesting point that "Do not run bleep on the Manager host". I take it that another manager is configured and used as the "bleep host". Based on the content here on Protect, it doesn't seem like many people configure it that way. Is that true?
Thanks for your continued help.
I think most people run bleep from a different host. Alternatively, consider the use of some exported events files from your current system and then replaying them through the replay connector (called the Test Connector when you install a SmartConnector).
You can export a whole bunch of events, convert them into an .events file and then replay them back. Try to create a whole bunch of them from your events (so the test is based on what you have and what you expect to see, not some random events that we might have for example) and create maybe 3-4 sets of these .events files. Then run the Test Connector a few times and feed in the events accordingly. You can then turn up / down the speed of the events and then monitor things within the Manager to see what the impact is.
Some of the standard monitoring dashboards are pretty good at looking at the data and hit, but also look at some of the logfu data and CPU / IO data at the OS too. That way you should get a pretty good view of where the higher levels are and what type of profile you have.