ESM 6.8 Performance tuning...
To continue on the discussion originally started for perf tuning ESM 6.0 in our environment (thank you to everyone who participated, special thanks to jbur for a very collaborative effort!), we have finished up our perf tuning guide for 6.8. Also, a big thanks to my team for working hard and providing a lot of input, this was not an effort of a single person.
Please feel free to add, comment, or suggest any changes that you have found in your environment, as these have been specifically tuned for our environment.
Short summary: The goal of our deployment is to each ESM sustain 20-30k EPS with burst capability of >60k EPS (to clear caches and handle influxes of data). With all base content disabled on ESM 6.8 patch 3, we are able to get the below configs above 80k EPS (we no longer test for max capacity - 80k was sufficient given our requirements). We do have pretty high quality hardware, but we have seen high-end VMs get quite a performance boost out of a number of these tuning components.
Standard disclaimer: A lot of these values will vary based on your hardware and environment, so please do not blindly implement these values without first understanding (and probably testing in a non-production environment) how it will impact your system. Also - if you have any questions, feel free to post here, but HP Support should be your primary resource for support. We have a pretty heavy workload, so don’t have the capability to check these forums often.
Some of our hardware varies, but generally
- 4 x Xeon processors with at least 8 cores running at least 2ghz (e5/e7)
- 256gb-768gb RAM
- ~10tb+ of Fusion IO drives (we have 3 and 6tb drives running in RAID 10 in our systems)
- Note: Fusion drives seem to be overkill, you will hit CPU and application bottlenecks before you hit IO issues with this config - specifically we haven't been able to sustain > 10% IO utilization on any ESM to this point, even with perf testing at 80k+ EPS.
Some high level key notes:
- CPU speed and cores are highly valuable for the ESM to process high volumes of EPS, if you want performance, do not go cheap on the CPUs
- As an example, we actually had some real-life differences measured recently, when comparing our two systems:
4 x Intel(R) Xeon(R) CPU E7- 4850 @ 2.00GHz (40 logical cores)
vs 4 x Intel(R) Xeon(R) CPU E5-4650L 0 @ 2.60GHz (32 logical cores)
We found the difference to be almost EXACTLY 20% better. With the same content, memory settings, config and perf tunings, similar FusionIO disks, and data, EPS ingestion cap was 20% higher and response times also reflected this.
- As an example, we actually had some real-life differences measured recently, when comparing our two systems:
- We haven't seen valuable gain in having more than approximately 300gb of RAM
- Instead of expensive hard drives, you can get away with using many (10-20+) SSD or spinning disks to achieve high IOPS and EPS in a RAID 0/10 configuration
- BIOS Configs:
- Disabling Hyperthreading seems to help (but hasn't been quantified)
- Disabling power saving settings on some servers has had serious performance increases (certain HP servers - by default they throttle CPU and other components)
- Installing only necessary packages is pretty important. Understand what you need to run your environment, and try to install the 'Minimal' if possible, adding all necessary packages after the OS is built.
- Setting all connectors and ESM servers OS clocks to GMT seems to help relieve many time related issues that may crop up later
- Again - installing minimal / only needed packages to run the environment seems to help quite a bit.
- If you have high quality SSDs or drives, consider using tuned
- We make some tuning configurations to 90-nproc.conf and sysctl.conf
- Specific callout for the vm.swappiness=0 setting in sysctl.conf - if the ESM processes ever cache, it will have drastic performance impacts. Prevent this by making sure you have enough RAM and tuning this setting, before its too late. If you are swapping you can release the swap with some scripts
- ESM configs
- Threads threads threads! Threads are important. Ensure you set both the ESM and the connector threads appropriately based on your EPS requirements (being smart about increasing threads is very important, don't just blindly add more threads ). I don't know of a good guideline/document for this at the moment, will update if I find one.
- Allocating the right amount of memory to the ESM process is important. We have found instability (meaning database connectivity issues) in setting this above 32gb of RAM, because of long-running full GCs.
- If you have a High EPS system, increasing/setting your logger jvm heap is also helpful
- There are a number of potential tunings in here, but there were two key callouts for us… innodb_buffer_data_pool (mysql memory), and sort_temp_limit (for trends/reports)
- A number of the other tunings are worthwhile to investigate, but haven’t had as much of a noticeable impact to our system performance or stability.
- Enabling the slow query log is extremely helpful for finding bad content, ensure that you roll this log J
- a. DISABLE : System Options -> Processor Options -> Intel(R) Virtualization Technology
- b. DISABLE : System Options -> Processor Options -> Intel(R) Hyperthreading Options
- c. DISABLE : System Options -> Processor Options -> Intel(R) VT-d
- d. Change to Maximum Performance : Power Management Options -> HP Power Profile
- e. Change to HP Static High Performance Mode : Power Management Options -> HP Power Regulator
- f. Change to Maximum Cooling : Advanced Options -> Thermal Configurationg. DISABLE : Advanced Options -> Advanced System ROM Options -> Power-On Logo
a. Installation type: Minimal
b. Additional packages to install:
yum -y install nano rsync unzip cifs-utils xfsprogs sysstat lsof ntpdate ntp xorg-x11-xauth xorg-x11-xinit libXtst pciutils mdadm man xorg-x11-server-utils tzdata mlocate tuned
c. Add/update: /etc/sysctl.conf
#ArcSight ESM performance tuning
net.core.rmem_max = 33554432
net.core.rmem_default = 131072
net.core.wmem_max = 33554432
net.core.wmem_default = 131072
vm.swappiness = 0
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
d. Configured tuned for enterprise storage
tuned-adm profile enterprise-storage
e. Add noop elevator to the end of the boot grub.conf configuration kernel line
a. Set ESM JVM Heap to 32gb (server.wrapper.conf)
b. Modify ESM server.properties to your needs (please note a number of these are *very* specific to our environment!)
# Do not automatically create new assets
# Increase each log file from 10MB to 100MB, required for high EPS systems troubleshooting
# modified from 1000000 to 1500000 for RepSM
# Automatically summarize bytesin/bytesout fields from the connector events
# If any destination receives more than 500 notifications within 1 day, start batching notifications
# Run reports in separate process to avoid clogging up ESM manager memory and processing
# Maximum number of failed logins to allow. A value of -1 means that there is no limitation.
# Allow up to 200 users to be created
# Allows many trends to run for long periods in high EPS systems
# Allow large package exports/imports - SYNC SCRIPT
a. Increase logger jvm heap (servers.sh)
ARCSIGHT_JVM_OPTIONS="-verbose:gc -Xms256m -Xmx8192m -XX:MaxDirectMemorySize=2500M -XX:+HeapDumpOnOutOfMemoryError -XX:-UseSplitVerifier -Djava.awt.headless=true "
a. Lots of mysql configurations. Again, please size these to your environment, located in my.cnf. A number of these already exist in the default my.cnf file (with different values), so confirm/add/update them.
table_open_cache = 4096
sort_temp_limit = 256G
join_buffer_size = 64M
thread_cache_size = 5120
table_cache = 4096
innodb_read_io_threads = 8
innodb_write_io_threads = 16
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_additional_mem_pool_size = 8M
max_connections = 100
max_heap_table_size = 64M
open_files_limit = 65535
tmp_table_size = 64M
read_buffer_size = 64M
read_rnd_buffer_size = 256M
slow_query_log = 1
slow_query_log_file = /opt/arcsight/logger/data/mysql/mysql_server-slow.log
innodb_buffer_pool_size = 144G
innodb_log_buffer_size = 32M
innodb_flush_log_at_trx_commit = 2
- Boot grub.conf file – with our research it seems noop or deadline were good choices for our environment, with noop seeming to make more sense for the fusion io drives specifically.
- tuned-adm is useful, but understand your storage type before configuring.
- Understand how much memory you have in your system and how much is available before setting the jvm heap for both ESM and Logger. If you only have 64gb of RAM, do not set these values to what we have, you will likely run out of memory. Ensure you have enough to run all processes on the box.
- ESM server.properties is highly configurable. If you look in the server.defaults.properties you can see a short description of any value and its default setting. These are extremely specific to individual environments and ESM needs, please set these values with care. We specifically have found that our trends/reports take a *long* time to run and found it valuable to set these accordingly. Along with a number of other requirements/custom solutions. Auto-creation of sensor assets destroys our ESM due to the high number of devices being logged in our environment, which is why this has been disabled.
- Mysql configurations are a massive discussion topic. Shortly a number of these are not optimized for a default install, although I have been assured that HP will be doing work to help provide tuning guidelines for this in the future. You can see the majority of research how we ended up on these in the original ESM 6 perf tuning guide: https://www.protect724.hpe.com/thread/8414
Sorry about the formatting. I tried multiple times writing this in Jive to find that it just slowed down to a halt and couldn't type or modify things. Then copied into word and tried to paste back... Thus the double bullets and bad spacing. I will try to find time and fix it later.
Fixed the formatting, I think. Also added a bit about our real-world scenarios with CPUs in the short summary at top under CPU info. Basically our core density on very similar CPUs had a huge impact (pretty much the exact % of performance as the % of extra cores).
Thanks for sharing. One addition to this could be huge pages (disable transparent huge pages tho).
So through our testing, I couldn't seem to get any noticeable results from using large/huge pages.
Do you happen to have any test results with what conditions gave improvements?
I think the reason (its been a while, I will have to look back to make sure I am being accurate) we stopped using the Huge Pages configs was due to newer Linux distros using Transparent which had taken care of most of the issues behind the scenes for us. It also simplified our configuration
...Also I just noticed that the vm.swappiness setting (in /etc/sysctl.conf) was somehow changed to m.swappiness in the original thread... If you copied the above config info (before it was changed), PLEASE note this change, as it is significant.
I don't have any benchmarks to share but the disabling of Transparent pages + the Huge Page recommendations (along with everything else in this post) stemmed from a professional service engagement to boost ESM performance.