Welcome Serena Central users! CLICK HERE
The migration of the Serena Central community is currently underway. Be sure to read THIS MESSAGE to get your new login set up to access your account.
turgon2007 Absent Member.
Absent Member.
822 views

WECS reports backlogged Agent


Hi,

Recently we installed the WECS service in a customer’s environment to
collect the Windows events in Sentinel.

When the WECS service starts, the collector starts receiving the events
at normal rate. In less than an hour though, the WECS log shows that the
“Agent is backlogged” and sending of events is suspended. After that
it’ll take a considerable time (sometimes hours) before sending of
events is resumed. After that the agent is quickly backlogged again. The
result is that the EPS slows down to a trickle. Sometimes to 7 EPS in a
minute.

When checking the server0.0.log on the sentinel system, it reports on
the Active Directory and Windows Collector :
99% (14.52 min) Raw Data Waiting to be Parsed

The Sentinel server’s resources and the WECS server’s resources don’t
appear the be the bottleneck. Neither is reaching its maximum capacity
when this occurs. The Sentinel server itself has 8 2,4Ghz CPUs and 16G
and uses half of that when encountering this problem.

The Collector has one WMI Connector attached to it, listening on the
(default) port 1024 and 7 (rather active) Active Directories running.

Does any one have an idea how to troubleshoot and/or solve this
performance problem? Especially the part where raw data cannot be parsed
while the server resources are still plentiful.

Any help would be much appreciated.


Kind Regards,

Bart


--
turgon2007
------------------------------------------------------------------------
turgon2007's Profile: https://forums.netiq.com/member.php?userid=9632
View this thread: https://forums.netiq.com/showthread.php?t=53759

0 Likes
16 Replies
ScorpionSting Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Exact same thing in our environment....was hoping a Collector Manager
would help us out, but getting authority to build one is proving
extremely difficult....

What version of Sentinel Server?
What version of WMI Connector?
What version of Microsoft Active Directory and Windows Collector?
What settings you have in eventManagement.config?


--
-"Also now available in 'G+'
(https://plus.google.com/u/0/112362149544381813153) and 'Website'
(https://www.isam.kiwi/) format".- 😉
------------------------------------------------------------------------
ScorpionSting's Profile: https://forums.netiq.com/member.php?userid=469
View this thread: https://forums.netiq.com/showthread.php?t=53759


Visit my Website for links to Cool Solution articles.
0 Likes
turgon2007 Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Hi ScorpionSting,

Thanks for replying.

We're using Sentinel 7.3.0.0.
I can't connect to the customer's site at the moment, so I can't check
the exact version of the Connector and Collector, but I’m pretty sure
both are version 2011.1r4

In the eventManagement.config, we've set the behavior to 'localhost'.
We've tweaked the performance option when we first encountered the
problem to:
<performance concurrency="300" messageQueueUpperBound="300000"
messageQueueLowerBound="50000”/>
But the performance problem occurred just the same.
All other settings in eventManagement.config are still at their
defaults.


--
turgon2007
------------------------------------------------------------------------
turgon2007's Profile: https://forums.netiq.com/member.php?userid=9632
View this thread: https://forums.netiq.com/showthread.php?t=53759

0 Likes
ScorpionSting Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Interesting....wonder if its an issue with the Collector... We running
WMI 2011.1r4-201409300209-release,
Microsoft_Active-Directory-and-Windows 2011.1r5-201505071200-preview
(desperately needed SIDs), WECS v. 2011.1.4.1 (I'm about to check plugin
for update to this), and Sentinel 7.3.0.1_1817

I've tried tweeking our .config as:


Code:
--------------------
<protocol historyInterval="86400" historyQueryInterval="120" realTimeInterval="30" queryDelayInterval="500" cacheConnections="true" pushTimerInterval="10" pushDataLimit="50000" clockDelayInterval="5">
<taskManager activityTimeout="960" statusUpdateInterval="60" errorRetryInterval="300" />
</protocol>
--------------------


Real Time got pushed up to try and decrease CPU load on DC's (x 9) when
query is performed.


--
-"Also now available in 'G+'
(https://plus.google.com/u/0/112362149544381813153) and 'Website'
(https://www.isam.kiwi/) format".- 😉
------------------------------------------------------------------------
ScorpionSting's Profile: https://forums.netiq.com/member.php?userid=469
View this thread: https://forums.netiq.com/showthread.php?t=53759


Visit my Website for links to Cool Solution articles.
0 Likes
ScorpionSting Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


I've created an SR for this.... I'll be interested to see what
engineering make of it.


--
-"Also now available in 'G+'
(https://plus.google.com/u/0/112362149544381813153) and 'Website'
(https://www.isam.kiwi/) format".- 😉
------------------------------------------------------------------------
ScorpionSting's Profile: https://forums.netiq.com/member.php?userid=469
View this thread: https://forums.netiq.com/showthread.php?t=53759


Visit my Website for links to Cool Solution articles.
0 Likes
ScorpionSting Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


First response was to add an additional WECS and link it to a secondary
collector to spread the load.... Easier said than done in this
environment, I won't get that till 2016/2017 FY with the 2015/2016 FY
just starting in a few days.


--
-"Also now available in 'G+'
(https://plus.google.com/u/0/112362149544381813153) and 'Website'
(https://www.isam.kiwi/) format".- 😉
------------------------------------------------------------------------
ScorpionSting's Profile: https://forums.netiq.com/member.php?userid=469
View this thread: https://forums.netiq.com/showthread.php?t=53759


Visit my Website for links to Cool Solution articles.
0 Likes
turgon2007 Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Setting up a secondary Collector isn't a option for us either.
We did try to split up the load between several connectors, but if the
problem lies at Collector level, I suppose that won't solve the
problem.

We do have a similar setup with another customer though where
performance doesn't seem to be a problem at all.
2011.1r4-201404110301-release (Collector)
2011.1r4.201409300209-release (Connector)
Everything runs smoothly with the default settings and we divided
approx. 70 Events Sources over two Connectors (and 1 Collector).
While this customer's servers don't generate as much events as the one
where the performance problems do occur, the EPS is still considerable.

Maybe it's something in the environment?
Sentinel with the Collector performace problems is running on RHEL 6.5.
The other customer is running the Appliance (and has no performance
problems).


--
turgon2007
------------------------------------------------------------------------
turgon2007's Profile: https://forums.netiq.com/member.php?userid=9632
View this thread: https://forums.netiq.com/showthread.php?t=53759

0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

Re: WECS reports backlogged Agent

On 06/26/2015 02:09 PM, turgon2007 wrote:
>
> Setting up a secondary Collector isn't a option for us either.
> We did try to split up the load between several connectors, but if the
> problem lies at Collector level, I suppose that won't solve the
> problem.


You could have two connectors, one under each of two collectors, which
would then split the load across processors better than you have right now
(assuming you can split the load across connectors).

> Maybe it's something in the environment?
> Sentinel with the Collector performace problems is running on RHEL 6.5.
> The other customer is running the Appliance (and has no performance
> problems).


SUSE is definitely better (disclaimer: I like SUSE). 😉

Honestly, I'd be a little surprised to see ONLY that make a significant
difference. If they have their RHEL box burdened with monitoring
software, or part of an over-utilized VM host, or given different hardware
resources, that'd be more likely in my mind.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...
0 Likes
ScorpionSting Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


We're running a SLES appliance, so not sure O/S will help much....


--
-"Also now available in 'G+'
(https://plus.google.com/u/0/112362149544381813153) and 'Website'
(https://www.isam.kiwi/) format".- 😉
------------------------------------------------------------------------
ScorpionSting's Profile: https://forums.netiq.com/member.php?userid=469
View this thread: https://forums.netiq.com/showthread.php?t=53759


Visit my Website for links to Cool Solution articles.
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Hi,

One of the biggest performance blocker were communication channel
between WECS server and collector manager, usually most of issues were
solved by adding additional connector. For example in environment where
we collect about 200eps from windows servers we are using 6 WMI
connectors. It's definitely decrease delay in pooling logs from windows
servers.


--
Arek.


--
arekmacak
------------------------------------------------------------------------
arekmacak's Profile: https://forums.netiq.com/member.php?userid=6080
View this thread: https://forums.netiq.com/showthread.php?t=53759

0 Likes
turgon2007 Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Hi everyone,

Thanks for all replying and helping out.

At the moment we're running 2 Collectors - each with 2 WMI Connectors.
I've tried to even the load over these 2 Collectors.

I've started this configuration 2 days ago and I haven't seen any "Agent
backlogged" events.
EPS is around 700 now and the customer is commenting on the volume of
mails they're receiving as a result of the Correlation Rules that fire
now, so all seems well 🙂


--
turgon2007
------------------------------------------------------------------------
turgon2007's Profile: https://forums.netiq.com/member.php?userid=9632
View this thread: https://forums.netiq.com/showthread.php?t=53759

0 Likes
turgon2007 Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Thanks ab, I'm trying the two Collector approach at the moment.
First results are not very encouraging, but I'm not done yet
experimenting with the Collector/Connector setup.


--
turgon2007
------------------------------------------------------------------------
turgon2007's Profile: https://forums.netiq.com/member.php?userid=9632
View this thread: https://forums.netiq.com/showthread.php?t=53759

0 Likes
brandon-langley Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


turgon2007;258453 Wrote:
> Setting up a secondary Collector isn't a option for us either.
> We did try to split up the load between several connectors, but if the
> problem lies at Collector level, I suppose that won't solve the
> problem.
>
> We do have a similar setup with another customer though where
> performance doesn't seem to be a problem at all.
> 2011.1r4-201404110301-release (Collector)
> 2011.1r4.201409300209-release (Connector)
> Everything runs smoothly with the default settings and we divided
> approx. 70 Events Sources over two Connectors (and 1 Collector).
> While this customer's servers don't generate as much events as the one
> where the performance problems do occur, the EPS is still considerable.
>
> Maybe it's something in the environment?
> Sentinel with the Collector performace problems is running on RHEL 6.5.
> The other customer is running the Appliance (and has no performance
> problems).


So there's some misunderstanding here that I think will help.

We have two different sorts of connector types, and there is a distinct
behavior difference when it comes to performance and load-balancing.

1) There are "Event Source" based connectors, such as File, WECS, etc,
where you specifically associate an event source to a collector.
These are challenging to manage performance in the case of WECS, because
there can only be one instance of the collector/connector.

2) There are "Event Source Server" base connectors, such as Syslog and
SAM. These are very friendly to performance management, because one
event source server can actually service multiple collectors. So you
can have 5 collectors processing the output of a single Syslog/SAM event
source server. The 'Connector' in this case is just a virtual link
back to the Event Source Server itself.

Net/Net we have the following 'standard' choices we offer right now:

1) Agentless with WECS, but you're required to add additional WECS
agents to manage/balance load.
2) Agentless with SNARE, where you can easily balance the incoming load
across any number of collectors (as long as you have hardware), but you
take a performance hit per collector since we're full-text processing
vs. using some of extended data provided by WECS/SAM. You also don't
get to yell at NTS over issues with the agent. 🙂
3) Agent-based with SAM, which is probably the most scalable option.
The more endpoints you support, the less middleware you use vs. WECS.
Also the middlware resource cost vs. WECS is substantially lower after
the first system.

------------

Now as for performance - where is your EPS capping out? How many
collectors/connectors are running on this system total? How many 'real'
(not logical) CPU cores? Generally speaking if you have sufficient
cores on your CM for all the collector/connector pairs (and ESS's) and
you're doing less than 1500 EPS per collector instance, we probably have
a problem. We actually test at a much higher EPS, so if you're doing
lower than that we know we either have a resource constraint in the
system or a performance constraint in the collector that we should
investigate (open an SR).


--
brandon.langley
------------------------------------------------------------------------
brandon.langley's Profile: https://forums.netiq.com/member.php?userid=350
View this thread: https://forums.netiq.com/showthread.php?t=53759

0 Likes
turgon2007 Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Thanks for that clarification.

We're running the "WECS option": One middleware server that is managing
all connections to the Windows Event Logs. Mainly because our setup is
relatively small and a WECS server is for this customer the easiest way
to deploy this part of Sentinel. By checking the performance statistics
on the WECS server, I can tell there doesn't seem to be a resource
problem. Memory, network and CPU are for 50% in use.

Currently there are 2 Collectors installed, with 2 connectors each, and
I've tried to balance the load equally among the Collectors. There are
two very active AD's and the rest is moderately active.
In the last hour I've received 712.922 events (according to Sentinel -
post filtering, we're dropping Computer Generated and Object Access
Events on the Collector). At the moment both collectors are running
below 40% utilisation, so that's good.

We're running 10 Collectors in total on the Sentinel Server (we don't
have a separate Collector Manager - yet). We have maybe 15 Connectors en
approx 30 Event Sources in all. So a relatively small setup.

Current EPS according to the Collection Overview is 373 EPS. It seems to
average around move around the 250-300 during the day.

The server has 8 real sockets, although real... it's a VM. Load is below
1.00. Memory is 32G and about half of it is actually in use.

But somehow, Collector 1, which doesn't even have the most active AD's
associated with it, reported "Agent is backlogged" to WECS. At least,
that what's I'm reading in the swecs.log.
On inspection that Collector had a Utilization of 99%, while having the
easiest job of both Collectors. I've created a third Collector and moved
all Events Sources from the 1st to the 3rd Collector, leaving the 1st
Collector to do nothing.

I've applied ScorpionSting's configuration for WECS, just to see if it
will make a difference.

Somehow the performance will take a dive an hour after restarting the
WECS service. I'm keeping my fingers crossed.


--
turgon2007
------------------------------------------------------------------------
turgon2007's Profile: https://forums.netiq.com/member.php?userid=9632
View this thread: https://forums.netiq.com/showthread.php?t=53759

0 Likes
ScorpionSting Absent Member.
Absent Member.

Re: WECS reports backlogged Agent


Okay, so have just done this (no additional hardware required):



- Edit eventManagement.config so there are 2 endpoints:


Code:
--------------------
<client>
<endPoint address="tcp://sentinel.server.com:1024" behaviorConfiguration="localhost" />
<endPoint address="tcp://sentinel.server.com:1025" behaviorConfiguration="localhost" />
</client>
--------------------



- Edit the Sentinel firewall to allow the additional port
- I renamed the existing Collector and Connector to append "1"
- Created a new "Microsoft Active Directory and Windows 2" collector
- Created a new "WMI 2" connector (using the other port, but same
credentials)
- I just put a dummy hostname to get the add into the console
- Moved some of the event sources to the new collector and removed the
dummy host
- Restarted WECS


--
-"Also now available in 'G+'
(https://plus.google.com/u/0/112362149544381813153) and 'Website'
(https://www.isam.kiwi/) format".- 😉
------------------------------------------------------------------------
ScorpionSting's Profile: https://forums.netiq.com/member.php?userid=469
View this thread: https://forums.netiq.com/showthread.php?t=53759


Visit my Website for links to Cool Solution articles.
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.