Role and Resource Driver went Rogue
A driver was "Stuck" so restarted nds via ndsmanage. Have done this many times. Appears after restart of nds this time appears Role and Resource driver went crazy. Started removal and then add back of Roles/Resources tied to Dynamic Groups. The Groups affected appear to be random. What would cause the RR driver to start doing removes and then adds for no reason? Also users appeared to be members of the Dynamic Group LDAP query but had no nrfDynamicGroupMembership on account, thus no Role and Resource. This was a huge problem as users lost role that contains AD group that grants web browsing. Please help.
I agree with Norbert. There was a fix, for Dynamic group assignments that is important.
Some history to explain what is going on.
Role Based Entitlements driver/Entitlement Service Drriver allowed you to assign Entitlements to users based on Dynamic LDAP filters (Which is functionally an LDAP Dynamic group). This was deprecated. I know they WANT to kill it badly, but too many people are still using it, so that battle continues.
The replacement is the ability to assign a Resource with an Entitlement to a Dynamic LDAP group (or a Role with the Resource linked) and have all the members get it.
Side note: a nested group is treated in a similar fashion even though it might technically be a Static group,but it is handled in some ways similar to a Dynamic group. (It makes more sense if you look at the objects of both types).
But Dynamic groups do not cause an event on an attribute change for a driver to event upon.
So RRSD supports this by finding all assignments of anything to Dynamic groups, evaluates the member list (Which is kind of quantum mechanical... Are you a member of a dynamic group? Well only once you look...) is supposed to then find all those users, consider if they have the needed assignements and fix them.
But every patch or so often, they somehow seem to break this. There was a major change to make this multithreaded, since this specific task is a slow one. It has to get the list of all members, then check every single one for the current state. It is unclear to me if it is clever enough to read the user once, cache it, in case it comes up on the next group or role or whatever. And badly formed Dynamic groups can be slow to return and it all adds up. So moving this into its own thread in theory should help not block other tasks.
I think (does anyone know?) that part of the idea of the multithreaded RRSD was also to have multiple Dynamic lookup threads. But as always, multithreaded can be tricky to get correct.
You’re right, but there was a major problem with some “older” version of eDirectory - 8.8.8 Pt. 8 (as far as I remember), which was only fixed in a patch for 9.0.1.
The problem was that the dynamic group query would not always return the correct number of members, which caused the RRSD to be a bit erratic. In one query you would get all 100 member of a group, then next could give you 98, 78, or 82 members, and then it would give the right number again. Which caused the RRSD to remove and add members in a random form.
This was fixed with eDirectory and a patch for the RRSD. But then came the rewrite of the driver (as you so mention) which re-introduced the same or similar problem, which was fixed with the latest patch.
I see this error in the ndsd.log during the nds restart. I'm wondering if the restart made the RR driver go nuts or if it was already on it's way there before the restart. Regardless, I have my ServiceNow driver stuck again, waiting for a query response so the only way to get it going is a restart of ndsd. Now I am just going to make sure RR is set to manual start and monitor the queue after ndsd restart.
By the way. Has anyone had this experience with the ServiceNow driver getting stuck/queud up waiting for a SOAP response when query for sys_id? This is happening more and more and we have to restart ndsd as only solution.
Leadership is losing faith in NetIQ/eDir big time, but I have told them we are behind on patches/version and need to go to 4.8 - hopefully less issues with 4.8 and RRSD and the ServiceNow driver getting stuck querying.
novell.jclient.JCException: nameToID -632 ERR_SYSTEM_FAILURE
at com.novell.nds.dirxml.driver.nrf.NRFDynGroupUpdater.updateGroupMembers (NRFDynGroupUpdateer.java:359)