Highlighted
Established Member..
Established Member..
3208 views

ESM6 ingest first indication of something else bad..?

Jump to solution

-= 2015-01-05 Update -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Answer wrote:

You need to add a line in the server.defaults.wrapper.conf file, in the additional parameters section.

My line:  wrapper.java.additional.12=-XX:ReservedCodeCacheSize=256m

The index (12 in my case) needs to be adjusted depending on your file and changes you may have done to it.

NOTE: Do not modify this parameter without first consulting support.  This is a pretty dramatic change and there is definitely a wrong way to do it!


We finally have tested this in both our production and development environments.  We haven't had a reoccurrence in Dev in about 3 months and in prod in about 1 month.  This seems like a success.  We have had moments where the ESM will start to cache across the connectors, but it seems to recover without a restart.  Thanks for helping and all the support everyone, I am sure there may be other symptoms that are similar, but this specific issue seems to be resolved on our system.

-= ORIGINAL MESSAGE FOLLOWS -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

I know this is a fishing expedition, but im hoping we aren't the only ones going through this...  Maybe as a group we can combine our knowledge and findings..

Basically we have a system which we have been running in production between 10-25k EPS on ESM6 patch 1.

System specs are beyond recommendations (512gb ram, SSD in raid 10, 4x10 xeons).


Anyway - under normal circumstances our system does not have issues (besides ones already identified in other threads).  But periodically (random ~1-2 time per month max, we have gone 2+ months with no issue) we have this slowdown, it is visible in nearly everything, and a manager service restart fixes the issue completely.

Noticed symptoms:
-Insertion rates overall go down drastically (all connectors - normally we maintain between 50-80ms ingest on monitor:102 events, after slowdown 200-500+)

-ESM Garbage collections happen extremely frequently (normally they are 1/hr, they will be sporadic ~10-30minutes apart)

-Large trends slow down drastically (usually take about twice as long)

-CPU utilization goes up dramatically (normally we maintain 10-20% CPU at peak, it will spike and maintain 60-70%+)

Our investigations have come up fairly dry:

-The slowdown does not occur at specific timeframes, thus far seems random - occurs maybe 1-2 times per month max-

-Does not start necessarily during EPS peaks (we normally peak around 10am PST - this has occurred at night, sundays, and at 2pm - no definitive time/eps correlation)

-I do have monitoring set up looking for resource changes (trends/lists/etc) and nothing has come up conclusive

-Restarting the ESM/manager service *only* completely resolves the issue (not the server, MySQL, logger, etc)

      -^^^Adding to the restart fixing - ZERO other changes are made to the system (conf, content, event/filtering/aggregation, etc)

To be honest, I am a bit at a loss...  Has anyone else come across something similar - found resolution or made headway?

Just for full disclosure link below contains our system specs and configured parameters:

https://protect724.arcsight.com/thread/8414

Edit: 9/24/2014 - I don't know why this wasn't done earlier... But wrote a quick script that will put all of your full GCs in order

grep "Full GC" -A1 /opt/arcsight/manager/logs/default/server.std.log* | awk '{print $6 " " $7 " | " $9 " " $10}' | sort

0 Likes
111 Replies
Highlighted
Absent Member.
Absent Member.

Answer wrote:

On my side, I see none of these events, except when the manager is restarted, I get a manager:100

Thanks!  Is anyone seeing "manager:" or "database:" events when the ingest problem occurs?

-Joe

0 Likes
Highlighted
Respected Contributor.
Respected Contributor.

Joe,   manager: and other internal monitor: events do not appear when manager is overloaded with events or having an issue processing events.   It seems that at some point manager "decides" that these are of a lower priority ...

0 Likes
Highlighted
Absent Member.
Absent Member.

Clark Kent,

From your other post:

"22-30K eps for 30 days+  ;  no reboot or manager restart  ;  no issues"

Does that mean you see the issue on your system too?

FYI - Sometimes you'll see the following when the manager is having issues with ingest. (But apparently not for what we're trying to fix in this thread)

manager:200

manager:201

database:102

-Joe

0 Likes
Highlighted
Established Member..
Established Member..

We have zero manager:200, manager:201, and database:102 events.  This occurred last night and there were none.

It triggered none of my 'critical system failures' which has a whole series of event ID (including manager 200/201, although I don't have database 102 - have 100/101) in it...

0 Likes
Highlighted
Absent Member.
Absent Member.

I am using Kaminario SSD and 125 connectors into ESM and I also have the issue with high CPU and no persisitance. I am also suffering from the IPV6 error mentioned above. Support tried to blame the name of a connector which is clearly not the issue. I am suprised that support was not aware of this asset creation problem as it seems to have to documented tickets.

0 Likes
Highlighted
Absent Member.
Absent Member.

Thanks- How did you find this out? I was having an issue with esm and supported pointed out this error and claimed it was my content.

0 Likes
Highlighted
Established Member..
Established Member..

Weird update, support just had us restart the rules engine (not the entire manager service) via the new version of manage.jsp.  Cleared up our GC issues, caches, and memory real quick... (strikethrough'd the rule engine restart because it causes some major irrecoverable problems)

Going to monitor for a couple weeks to see if this maintains the same as a manager service restart...

0 Likes
Highlighted
Absent Member.
Absent Member.

How do you restart the rules engine?

0 Likes
Highlighted
Established Member..
Established Member..

***This is not a command to be used lightly!!!  I would highly suggest troubleshooting this through support.

Login to the ESM 6 advanced admin web console, system management, rules engine, and then restart at the bottom.  I also didn't know of this capability before today.


(strikethrough'd the rule engine restart because it causes some major irrecoverable problems)

0 Likes
Highlighted
Honored Contributor.
Honored Contributor.


Well, when I opened my console this morning, I actually had the problem occuring. EPS were at 6k (normally 10-12k) and cache was pilling up. So I did try to restart the rules engine, and to my surprise, it actually worked!!! EPS went to 20k almost instantly and cache cleared up.

One thing I did not check was the CPU load, but, lately, I had the problem every 4-5 days, so I'll have plenty of occasions to check it out!

Thanks a lot for that info Ray, everyone will stop complaining about the frequent manager restarts!

0 Likes
Highlighted
Absent Member.
Absent Member.

This is awesome Ray, thanks for sharing!

So instead of restarting the manager every week, we just need to restart the rules engine?

I'm assuming any events the rule was accumulating towards a threshold would get dumped.

-Joe

0 Likes
Highlighted
Honored Contributor.
Honored Contributor.

One thing I just noticed, I got couple dashboards that are not working after the Rules Engine restart. Less of an impact on the analysts logged in, but bad if the dashboards stop working... (although I seen this behavior with 1 or 2 dashboards after a manager restart)

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.