Knowledge Partner
Knowledge Partner
1418 views

ndstrace just ... stops?

eDir 8.8.8.9 HF2

Ever seen where ndstrace just stops doing anything? I can go in, change the selected filters with "set dstrace=+thing" or whatever, but there is no activity shown at all. It works ok initially after startup, then just ... stops.

And, yeah, there's actual activity going on. Even something simple like an authentication server doing lots and lots of LDAP traffic, with +ldap enabled, shows nothing. A server running a couple of dozen IDM drivers, all with trace level 3, shows nothing with +dxml +dvrs.
Labels (1)
0 Likes
20 Replies
Knowledge Partner Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

No, I have not, at least not in recent memory.

Out of curiosity, are you also writing to disk? If so, how big are the
files? If I/O there slows things down, would that slow down the screen?
Hopefully not, but who knows.

Tried current code? I am compelled to ask, so sorry about that.

Does it happen on a non-really-busy system, or other systems in the
environment, or just customer systems, or just this one system?

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.
0 Likes
Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

ab;2470813 wrote:
No, I have not, at least not in recent memory.

Out of curiosity, are you also writing to disk? If so, how big are the
files? If I/O there slows things down, would that slow down the screen?
Hopefully not, but who knows.

Tried current code? I am compelled to ask, so sorry about that.

Does it happen on a non-really-busy system, or other systems in the
environment, or just customer systems, or just this one system?

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.


Yeah, me either.

Not writing to disk, just screen. Either interactive or with -l for console scrolling.

Current code is but a pipe dream with this client.

As far as I can tell, it's only happening on two of the ~24 systems in the environment. These are, naturally, the two busiest production systems. Their non-production counterparts are not affected. So it could be load or activity related.

These two share one other common trait. Every few hours, one of them loses its mind and can't find a network address for any server in the tree, including itself. -625 and -626 errors abound. Restart ndsd and it goes back to working normally for another few hours. Not sure if these are related. It seems like they might be. Got an SR open and escalated to backline on this already.

I know, the first reaction to seeing a -625 or -626 is always "network problem", but this one doesn't feel like a network problem. It feels more like something is getting lost in eDirectory. It could be a resource starvation problem, running out of threads or something like that. It doesn't seem to be running out of memory, and the VMs seem to have plenty of OS level resources.
0 Likes
Knowledge Partner Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

There are known older issues (probably on your version) where a -625
happens when replication doesn't happen fast enough, for example when you
try to replicate a million group membership values all at once and they
take more than a second or something to complete, and then eDir thinks
that the network connection isn't responding, so things fail. I'm making
up the numbers there, but that's the general idea.

Could that be related to ndstrace stuff? No idea.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.
0 Likes
Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

I had "similar" issues in the past (2-3 times).
No driver traces sent to iMonitor screen. Driver restart didn't help.

After eDirectory restart, issue disappear for number of months.
0 Likes
Knowledge Partner Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

I believe it was 8.8 SP8 which finally fixed a problem where ndstrace
could lose lines if the internal buffer (something like 100kB) was filled
because of more data coming in than could get out, and I think the fix was
to slow down the system to not lose data. Perhaps that change brought in
some other issue, but that is why I was asking about logging to a file, or
other I/O issues.


--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.
0 Likes
Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

ab;2470827 wrote:
I believe it was 8.8 SP8 which finally fixed a problem where ndstrace
could lose lines if the internal buffer (something like 100kB) was filled
because of more data coming in than could get out, and I think the fix was
to slow down the system to not lose data. Perhaps that change brought in
some other issue, but that is why I was asking about logging to a file, or
other I/O issues.


--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.


Losing lines I doubt I'd have even noticed. I'm just looking at it to see activity. So something simple like +RSLV, which should be really busy all of the time, nothing at all showing up.
0 Likes
Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

ab;2470827 wrote:
I believe it was 8.8 SP8 which finally fixed a problem where ndstrace
could lose lines if the internal buffer (something like 100kB) was filled
because of more data coming in than could get out, and I think the fix was
to slow down the system to not lose data. Perhaps that change brought in
some other issue, but that is why I was asking about logging to a file, or
other I/O issues.


--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.


I'm starting to think that this "fix" you refer to is part of the problem here. After another ndsd restart, I got ndstrace running. This system has a client application that seems to be doing the equivalent of:


get all users - search by object class
for-each user
search for groups that this user is a member of
end-for-each


The second search, for the groups, looks like it is un-indexed (member). So it takes a while to get through this loop. About 15-20 minutes. Then it waits (5 minutes) and does it again.

Watching this with ndstrace, now that it's working, the trace (screen and file) is currently 45 minutes behind reality.

No, that's not a typo. The timestamps in trace are 45 minutes ago. It's doing its best to show me every line of trace, but it's never going to catch up with this process kicking off every 20 minutes. I wonder what happens when whatever buffer it's using overflows...

Now, I don't normally sit and watch ndstrace scroll by. But if it is always maintaining that buffer, maybe based on whatever dstrace flags were last turned on, I could see it getting behind enough to die.
0 Likes
Knowledge Partner Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

On 02/13/2019 01:24 PM, dgersic wrote:
>
> I'm starting to think that this "fix" you refer to is part of the
> problem here. After another ndsd restart, I got ndstrace running. This
> system has a client application that seems to be doing the equivalent
> of:
>
> Code:
> --------------------
>
> get all users - search by object class
> for-each user
> search for groups that this user is a member of
> end-for-each
>
> --------------------


Um..... that sounds horrible. Why in the world would that ever be done?
I think you should introduce them to IDM, and the idea of
event-driven-ness, to stop this insanity.

> The second search, for the groups, looks like it is un-indexed (member).
> So it takes a while to get through this loop. About 15-20 minutes. Then
> it waits (5 minutes) and does it again.
>
> Watching this with ndstrace, now that it's working, the trace (screen
> and file) is currently 45 minutes behind reality.


Makes sense; doing bad things is bad. Give it a dozen more processors,
and faster I/O coming from ndstrace, and maybe a second datacenter. 😉

> No, that's not a typo. The timestamps in trace are 45 minutes ago. It's
> doing its best to show me every line of trace, but it's never going to
> catch up with this process kicking off every 20 minutes. I wonder what
> happens when whatever buffer it's using overflows...
>
> Now, I don't normally sit and watch ndstrace scroll by. But if it is
> always maintaining that buffer, maybe based on whatever dstrace flags
> were last turned on, I could see it getting behind enough to die.


I do not think that is how it works, but I do not have code access to
tell. I would strongly suspect it is NOT that way, as I've never noticed
a newly-started ndstrace instance starting out behind, even on a throttled
system, which would seem to be the way it must be if your theory is
correct. Nobody sane traces everything (or even many things per the last
ndstrace filters) all the time just to throw it away again, and doing so
would murder all eDirectory performance everywhere. Again, just my
rationalization of my own observations.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.
0 Likes
Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

ab;2495275 wrote:
On 02/13/2019 01:24 PM, dgersic wrote:
>
> I'm starting to think that this "fix" you refer to is part of the
> problem here. After another ndsd restart, I got ndstrace running. This
> system has a client application that seems to be doing the equivalent
> of:
>
> Code:
> --------------------
>
> get all users - search by object class
> for-each user
> search for groups that this user is a member of
> end-for-each
>
> --------------------


Um..... that sounds horrible. Why in the world would that ever be done?
I think you should introduce them to IDM, and the idea of
event-driven-ness, to stop this insanity.


Of course it's horrible. It's an application. Why? Because it's "easy". As much as I like the simplicity of working with LDAP, I kinda hate that applications developers are using it, because almost all of them get it so badly wrong.


ab;2495275 wrote:
On 02/13/2019 01:24 PM, dgersic wrote:

> The second search, for the groups, looks like it is un-indexed (member).
> So it takes a while to get through this loop. About 15-20 minutes. Then
> it waits (5 minutes) and does it again.
>
> Watching this with ndstrace, now that it's working, the trace (screen
> and file) is currently 45 minutes behind reality.


Makes sense; doing bad things is bad. Give it a dozen more processors,
and faster I/O coming from ndstrace, and maybe a second datacenter. 😉

> No, that's not a typo. The timestamps in trace are 45 minutes ago. It's
> doing its best to show me every line of trace, but it's never going to
> catch up with this process kicking off every 20 minutes. I wonder what
> happens when whatever buffer it's using overflows...
>
> Now, I don't normally sit and watch ndstrace scroll by. But if it is
> always maintaining that buffer, maybe based on whatever dstrace flags
> were last turned on, I could see it getting behind enough to die.


I do not think that is how it works, but I do not have code access to
tell. I would strongly suspect it is NOT that way, as I've never noticed
a newly-started ndstrace instance starting out behind, even on a throttled
system, which would seem to be the way it must be if your theory is
correct. Nobody sane traces everything (or even many things per the last
ndstrace filters) all the time just to throw it away again, and doing so
would murder all eDirectory performance everywhere. Again, just my
rationalization of my own observations.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.


I don't know either, just observing and guessing based on what I see.
0 Likes
Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

ab;2495275 wrote:
On 02/13/2019 01:24 PM, dgersic wrote:
>
> I'm starting to think that this "fix" you refer to is part of the
> problem here. After another ndsd restart, I got ndstrace running. This
> system has a client application that seems to be doing the equivalent
> of:
>
> Code:
> --------------------
>
> get all users - search by object class
> for-each user
> search for groups that this user is a member of
> end-for-each
>
> --------------------


Um..... that sounds horrible. Why in the world would that ever be done?
I think you should introduce them to IDM, and the idea of
event-driven-ness, to stop this insanity.



Update: Not only is this application's search behavior awful, it seems to be the trigger for this problem, and triggering a memory leak in ndsd. Yay.

I captured the whole thing to a trace file. I can recreate both searches using ldapsearch.

If I do only the second one, in a loop (22K time), it takes a bit of time, but nothing interesting happens.

If I do the first one (find 22K users), ndsd jumps about 8M of RAM (watching RES in top).

If I then repeat the looped search, ndsd jumps about 150M. Each time I run the loop, I lose another 150M.

After doing this, ndstrace no longer works.

Next up, narrowing it down to see if I can find the trigger. Both searches are kinda nasty. But it seems that the first one is what sets it off, after which the second one just makes it worse.
0 Likes
Knowledge Partner Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

Just some thoughts:

Are you using paged searches, or server-side sort (SSS), or virtual list
view (VLV) controls? If so are you abandoning any searches prematurely?
Does it matter if the user(s) has/have Universal Password (UP) setup?

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.
0 Likes
Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

ab;2495704 wrote:
Just some thoughts:

Are you using paged searches, or server-side sort (SSS), or virtual list
view (VLV) controls? If so are you abandoning any searches prematurely?
Does it matter if the user(s) has/have Universal Password (UP) setup?

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.


"No" to all of the above. It's a simple search for users:


(|(objectclass=person)(objectclass=organizationalPerson)(objectclass=inetOrgPerson))


with a big long list of attributes to return. I think a few of them don't actually exist in the schema, but haven't checked them all yet. Base is "ou=users,o=client". Subtree search. Simple.

Then for-each through the users found in the first one, doing a search like:


(|(&(objectclass=groupOfUniqueNames)(uniquemember=cn=bob,ou=people,o=client))(&(objectclass=groupOfNames)(member=cn=bob,ou=people,o=client))(&(objectclass=posixGroup)(|(memberUid=bob)(memberUid=xbob))))


The first search finds 15K or so users. The second search is 15K searches, one for each user.

If I do just the second search, using ldapsearch only, so nothing special, nothing interesting happens.

If I do just the first search, I see ndsd grow by 15M or so.

If I then do the second search, I see ndsd grow by 150M.

This is 100% repeatable, using nothing more than those two search filters, a list of attributes, and ldapsearch. Once this is triggered, I can watch ndsd continue to grow in size. It looks like every LDAP operation leaks a bit more memory.

I don't know how long this has been going on for, I suspect a while. I only noticed because I started getting complaints about the loadaverage on this host being too high for too long. Found that ndsd was being hammered, because that second search was using non-indexed "member" in the filter. The second set of searches took about 20 minutes to complete. The application runs the trawl, waits five minutes, then does it again. So it was doing it about 2-3 times an hour. Lose 300M per hour, it takes a little while to lose enough to notice.

I fixed it. I added an index for member. Now the second set of searches runs in about a minute. Now it can trawl for data 10 times an hour.

Ugh.

So we've calmed it down by telling it to wait longer between trawls. Next up is figuring out what's causing the leak. I suspect the list of attributes in that first query, some of which I think are not defined in the schema.
0 Likes
Knowledge Partner
Knowledge Partner

Re: ndstrace just ... stops?

al_b;2470825 wrote:
I had "similar" issues in the past (2-3 times).
No driver traces sent to iMonitor screen. Driver restart didn't help.

After eDirectory restart, issue disappear for number of months.


This seems to happen about 15 minutes after startup.
0 Likes
hendersj Acclaimed Contributor.
Acclaimed Contributor.

Re: ndstrace just ... stops?

On Tue, 28 Nov 2017 16:46:03 +0000, dgersic wrote:

> eDir 8.8.8.9 HF2
>
> Ever seen where ndstrace just stops doing anything? I can go in, change
> the selected filters with "set dstrace=+thing" or whatever, but there is
> no activity shown at all. It works ok initially after startup, then just
> ... stops.
>
> And, yeah, there's actual activity going on. Even something simple like
> an authentication server doing lots and lots of LDAP traffic, with +ldap
> enabled, shows nothing. A server running a couple of dozen IDM drivers,
> all with trace level 3, shows nothing with +dxml +dvrs.


The only time I've ever seen anything like this was back in the NDS 6
days - and the result was due to a defect that caused the DSA to become
completely nonresponsive under heavy load.

If the DSA is responsive but trace isn't doing anything, sounds like a
completely different sort of issue than what I saw.

Jim

--
Jim Henderson, CNA6, CDE, CNI, LPIC-1, CLA10, CLP10
Novell/SUSE/NetIQ Knowledge Partner
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.