Full GC problem
Probably you haven't assigned enough Java heap memory. Please run a
grep -A 1 "Full GC" server.std.log
to see the assigned maximum and the memory used after full GC.
Full GC every 10-20 minutes on an ESM is a little too frequent. This can be resolved by dedicating more memory to the java process by...
Edit the /opt/arcsight/manager/config/server.wrapper.conf
The values noted there will show you what you are currently set to.
"Initmemory" is what the ESM will carve out for the ESM to start with. This makes the ESM more efficient as it will not need to grow to that size, it starts there. "maxmemory" is where you allow your ESM to grow the value to. Now were is the issue with the higher value. If and when the ESM gets to a point of needing to do a full GC it will stop all processing on the ESM to clear down memory to the initial limit. A healthy ESM should see a full GC 30 minutes to an our or so if under a load.
This should help you. If this is a good answer, mark it as such.
Triggering a Full GC quite often than a normal pace (every hour) can be caused by many reasons. But, they are mainly from:
1) Running large reports/trends in a very close range
2) Manage is having some insertion performance issues, etc
You should look for answers of below:
Did you seen more caching events by the time when the incident happens?
What's EPS by that time?
Did it happen randomly, around the same time window, or consistently days and nights?
Di you notice whether some large Reports/Trends were running by that time?
Adding more java heap memory is the last solution. If Full GC happens all times for over days, then it may need to add more memory for the manager server.
As last solution is not necessarily an accurate statement. Working with HP/ArcSight Professional Services as well as their field engineers one would see that the ESM and connectors for that matter will do full GC normally. The issue normally is the frequency they are happening as when they happen, everything stops.
On a Smart Connector, the connector does a stop and start which during that time frame no events process; this is when you may see your connectors caching.
On an ESM you may see where all connectors suddenly show a cache and then clear, refreshing of dashboards delay.
I agree with Nellie as far as there are things one can look at to see what could be provoking the GC to happen more frequently but all in all, a full GC on an ESM happening every 10-20 minutes is to frequent and adjusting those parameters may help as the other points noted above can be more difficult to point to as the source of the difficulty; I have found this true when working with HP Engineers... it is a moving target.
The suggestion made above (by me) also has a caveat. More MAX memory if set to high, when it does have a full GC, remember, your ESM is not processing as it clears that space and shrinks the stack.
Good comment... just another tidbit.
I have been in Arcsight ESM support team for the past 7 years. I had seen the similar issue occurred often in CORR-Engine lately where manage hit insertion related issues.
I would like to suggest following KM KM00326893 to collect manager thread dumps while Connectors are caching (all connectors suddenly show a cache and then clear - one of very classic manager insertion performance related symptoms) and open a support ticket as needed. It will help you to find the real root cause instead of adding more memory.
By way of a reminder, I am not the one having the issue. I was speaking as the voice of experience in the field. I do not suspect that years of support equals problem being fixed and so is the reason why I do not start out with how many years I have been developing the product. I suspect if one could open a support TT they would have. But as many who visit protect 24/7, they have already went down the HP support route with no real resolution and so people ask questions of others in the field. Any performance issue can eventually be boiled down to content. To reiterate, full GC's are normal, the frequency of a full GC usually is an indicator of an issue (too often that include symptoms of all connectors caching at once, etc, etc).
I am not saying that what has been said about 'look at content' as this may be causing an issue is wrong
I am saying that the ESM by default in this area is sorely lacking and normally needs bumped up.
So to answer your question "what could be causing it?""-
- Something inside the ESM that was changed (content, reports, trends, Active/Sesson Lists, etc)
- Traffic entering the ESM
What can be done about it?
- Look into the ESM and see what has been added/changed that may have started this condition
- Open a HP TT, run the log collection script, and have HP try to figure out what had changed.
- Adjust the java memory parameters for the ESM as in the end of the day, as more content is added to your ESM, you will find that HP will recommend these values be increased.