Anonymous_User Absent Member.
Absent Member.
557 views

ndsd high processor utilization


After rebooting a fully patched OES11 server, ndsd process goes up to
over 300% of processor utilization. Other eDirectory servers can not
"talk" to the box during this time. We have to issue a rcndsd stop
(which takes several minutes to shut down) and then we restart it.
After the restart things seem to be ok (normal processor utilization)
but if we run ndsrepair -U -Ad we get lots of TimeStamp errors. The on
site tech ran ndsrepair -P -Ad -A and picked the partition in question,
then ran repair Timestamps and declare a new epoch. This does clear the
timestamp errors, but when we reboot the server, the same old problem
comes back (ndsd loads but jumps to over 300% processor utilization).

So, in short, things do NOT seem to be happy in ndsd land. Anyone have
any ideas on what the problem might be? I looked through the schema and
ndsd log files and there is nothing that seems out of the ordinary. No
errors, eDir thinks it started properly after the reboot etc.


--
blewis12
------------------------------------------------------------------------
blewis12's Profile: https://forums.netiq.com/member.php?userid=9352
View this thread: https://forums.netiq.com/showthread.php?t=53180

Labels (1)
0 Likes
7 Replies
Anonymous_User Absent Member.
Absent Member.

Re: ndsd high processor utilization

\On 03/24/2015 07:55 AM, blewis12 wrote:
>
> After rebooting a fully patched OES11 server, ndsd process goes up to
> over 300% of processor utilization. Other eDirectory servers can not
> "talk" to the box during this time. We have to issue a rcndsd stop
> (which takes several minutes to shut down) and then we restart it.
> After the restart things seem to be ok (normal processor utilization)
> but if we run ndsrepair -U -Ad we get lots of TimeStamp errors. The on


Do not do an unattended repair unless you know you need it, and then you
should be doing a regular repair (not unattended) which lets you d the
same things but often without locking the DIB, and in a more-granular way.
Even that is usually the last step after a bit of troubleshooting which,
so far, I have not seen comments on having been done.

> site tech ran ndsrepair -P -Ad -A and picked the partition in question,
> then ran repair Timestamps and declare a new epoch. This does clear the


Declaring epochs is not a trivial task and doing so makes me think you're
really, really grasping at straws. There is a good reason to do a
partition epoch, and it's not anything you've mentioned so far;
eDirectory's ndsrepair utility often reports about odd modification
timestamps, which is why ndsrepair is the last tool used for
troubleshooting purposes, because it reports things that are informational
as "errors" which leads to other decisions that are not good for a tree.

> timestamp errors, but when we reboot the server, the same old problem
> comes back (ndsd loads but jumps to over 300% processor utilization).
>
> So, in short, things do NOT seem to be happy in ndsd land. Anyone have
> any ideas on what the problem might be? I looked through the schema and
> ndsd log files and there is nothing that seems out of the ordinary. No
> errors, eDir thinks it started properly after the reboot etc.


Typically high utilization comes because of a request made that causes the
service to work really hard. Often those come in via LDAP, and LDAP
tracing is some of the easiest to analyze from ndstrace. Look at step 1,
and then the Linux (or cross-platform) section of step 2 from TID# 7007106:
https://www.novell.com/support/kb/doc.php?id=7007106

Sometimes disabling the LDAP service can help isolate this as well since
any LDAP clients submitting queries then cannot connect which may help
isolate the issue. Similarly, unplugging the NIC when loading eDirectory
can rule in/out traffic that is local vs. coming from a remote system.

Other things to look at, regardless of LDAP or not, is the +RECM (Recman)
filer in ndstrace as that shows queries coming in and if/which indexes are
used; a lack of indexes in searches CAN cause high utilization.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: ndsd high processor utilization


ab;255600 Wrote:
> \On 03/24/2015 07:55 AM, blewis12 wrote:
> >
> > After rebooting a fully patched OES11 server, ndsd process goes up to
> > over 300% of processor utilization. Other eDirectory servers can not
> > "talk" to the box during this time. We have to issue a rcndsd stop
> > (which takes several minutes to shut down) and then we restart it.
> > After the restart things seem to be ok (normal processor utilization)
> > but if we run ndsrepair -U -Ad we get lots of TimeStamp errors. The

> on
>
> Do not do an unattended repair unless you know you need it, and then
> you
> should be doing a regular repair (not unattended) which lets you d the
> same things but often without locking the DIB, and in a more-granular
> way.
> Even that is usually the last step after a bit of troubleshooting
> which,
> so far, I have not seen comments on having been done.
>
> > site tech ran ndsrepair -P -Ad -A and picked the partition in

> question,
> > then ran repair Timestamps and declare a new epoch. This does clear

> the
>
> Declaring epochs is not a trivial task and doing so makes me think
> you're
> really, really grasping at straws. There is a good reason to do a
> partition epoch, and it's not anything you've mentioned so far;
> eDirectory's ndsrepair utility often reports about odd modification
> timestamps, which is why ndsrepair is the last tool used for
> troubleshooting purposes, because it reports things that are
> informational
> as "errors" which leads to other decisions that are not good for a
> tree.
>
> > timestamp errors, but when we reboot the server, the same old problem
> > comes back (ndsd loads but jumps to over 300% processor utilization).
> >
> > So, in short, things do NOT seem to be happy in ndsd land. Anyone

> have
> > any ideas on what the problem might be? I looked through the schema

> and
> > ndsd log files and there is nothing that seems out of the ordinary.

> No
> > errors, eDir thinks it started properly after the reboot etc.

>
> Typically high utilization comes because of a request made that causes
> the
> service to work really hard. Often those come in via LDAP, and LDAP
> tracing is some of the easiest to analyze from ndstrace. Look at step
> 1,
> and then the Linux (or cross-platform) section of step 2 from TID#
> 7007106:
> https://www.novell.com/support/kb/doc.php?id=7007106
>
> Message received on doing ndsrepair and declaring a new epoch!
>
> We rebooted the box again and the problem comes in when we attempt to
> issue the ldapconfig set "LDAP Screen Level=all"
>
> We Get Authentication failed for ouradminuser error: failed, transport
> failure (-625)
>
> So, in short we can't do any tracing as we can't authenticate because
> something is amiss with eDirectory......
>
> Also, the ndsd process does finally drop to a normal/acceptable range,
> but LDAP queries are not being processed and we can not browse the sys
> volume.
>
> Just FYI, We are running Zenworks Satellite server/services on these
> machines.
>
> Also, a very odd thing is when I look at the ndsd.log file after the
> reboot, all the time stamps are in the future.
> i.e. we restarted at ~8:45 a.m. but the log file is showing time stamps
> of 10:25 a.m.
>
> Sometimes disabling the LDAP service can help isolate this as well
> since
> any LDAP clients submitting queries then cannot connect which may help
> isolate the issue. Similarly, unplugging the NIC when loading
> eDirectory
> can rule in/out traffic that is local vs. coming from a remote system.
>
> Other things to look at, regardless of LDAP or not, is the +RECM
> (Recman)
> filer in ndstrace as that shows queries coming in and if/which indexes
> are
> used; a lack of indexes in searches CAN cause high utilization.
>
> --
> Good luck.
>
> If you find this post helpful and are logged into the web interface,
> show your appreciation and click on the star below...


I appreciate your response, but unless you have some other suggestions
it's looking like we may have to open a case with support.


--
blewis12
------------------------------------------------------------------------
blewis12's Profile: https://forums.netiq.com/member.php?userid=9352
View this thread: https://forums.netiq.com/showthread.php?t=53180

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: ndsd high processor utilization

I presume, then, that you saw nothing via ndstrace as suggested; if so,
hopefully support can help.


--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: ndsd high processor utilization


No, I ran the traces pre-reboot and didn't see anything out of the
ordinary. Per my post we could not authenticate to eDirectory to run
the traces after the reboot. Once we stopped and started ndsd, traces
showed ok and all functionality returned. Seems the problem is only on
a reboot of the server.

Thanks,

Ben

ab;255709 Wrote:
> I presume, then, that you saw nothing via ndstrace as suggested; if so,
> hopefully support can help.
>
> --
> Good luck.
>
> If you find this post helpful and are logged into the web interface,
> show your appreciation and click on the star below...



--
blewis12
------------------------------------------------------------------------
blewis12's Profile: https://forums.netiq.com/member.php?userid=9352
View this thread: https://forums.netiq.com/showthread.php?t=53180

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: ndsd high processor utilization

Looking back at your previous response I see now that for some reason my
client did not render things properly so I did not see any of your response.

Regarding tracing, you should be able to run the command to enable LDAP
tracing at any time and then just capture it later with ndtrace, meaning
you should have it setup and not need to authenticate (and get that -625)
when the problem actually comes. Support will probably have you do this
once you connect with them since, one way or another, analysis of the
situation will need to be retrieved, and ndstrace is a good first step.

The bit about timestamps being off is odd; if these are VMs, or even if
not, be sure your timestamps are reliable. ndsd consumes time from the
host, so if your time is off in a log somewhere then it is almost certain
your host time is fluctuating. You should probably see the same
fluctuation in other files, like /var/log/messages.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: ndsd high processor utilization


OK, I tried to capture a trace but it did not put anything new in the
ndstrace.log file so I'm either doing something wrong or it just doesn't
collect any information. Per the TID, I did not issue the ldapconfig
set "LDAP Screen Level=all" since it can't authenticate while the ndsd
process is at 300 percent'ish, but I did issue these commands
immediately after a reboot:

ndstrace
set ndstrace=nodebug
ndstrace +ldap
ndstrace +time
ndstrace +tags
set ndstrace=*r (to clear the trace file)
ndstrace screen on
ndstrace file on

Nothing new in ndstrace.log file.

Thanks,

Ben

ab;255770 Wrote:
> Looking back at your previous response I see now that for some reason
> my
> client did not render things properly so I did not see any of your
> response.
>
> Regarding tracing, you should be able to run the command to enable LDAP
> tracing at any time and then just capture it later with ndtrace,
> meaning
> you should have it setup and not need to authenticate (and get that
> -625)
> when the problem actually comes. Support will probably have you do
> this
> once you connect with them since, one way or another, analysis of the
> situation will need to be retrieved, and ndstrace is a good first step.
>
> The bit about timestamps being off is odd; if these are VMs, or even if
> not, be sure your timestamps are reliable. ndsd consumes time from the
> host, so if your time is off in a log somewhere then it is almost
> certain
> your host time is fluctuating. You should probably see the same
> fluctuation in other files, like /var/log/messages.
>
> --
> Good luck.
>
> If you find this post helpful and are logged into the web interface,
> show your appreciation and click on the star below...



--
blewis12
------------------------------------------------------------------------
blewis12's Profile: https://forums.netiq.com/member.php?userid=9352
View this thread: https://forums.netiq.com/showthread.php?t=53180

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: ndsd high processor utilization

Anything show up if you add the RECM flag in ndstrace?

I always forget which other options will show you basically everything
going on in a useful way, but either ABUF or CBUF will show you all
interactions between clients and this server, as I recall. You could also
try SYNC and SKLK (remember to put a + before anything you are trying to
add, just like TIME/TAGS/LDAP before) will show you replication, JNTR will
show you the Janitor process, etc. I'd probably try all of those and then
ABUF and CBUF last, as they will almost certainly flood you with stuff
that is probably best handled by support.

LAN traces are also an option, particularly if you start it BEFORE
eDirectory loads (it's also possible to load ndsd without the DIB so that
you can have tracing going before things open up and start answering
client requests).

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.