Strange IDM Driver deployment error

0 Likes
There are many errors you might see in NetIQ Identity Manager, and at some level it bugs me how few of them are noted in the documentation. In principle the Knowledge Base (TID's, Technical Information Documents) ought to have more, and often do, but it seems like the errors noted in TIDs should get migrated into the docs at some point. To be fair, some of the driver documentation has started coming with more error examples added, but if you have read any of my work before, you will know I am never satisfied. Which no doubt annoys some of the folk there, but such is life.

My approach to this issue has been, when I see an interesting error go by in trace, I snag it and store it in a text file. Then on the bus ride into work, or home, start writing an article around the error. I try to explain how to trigger the error, the root cause (if I can figure it out), how I fixed it, and most importantly how I figured out what it meant and then how I figured out how to fix it. I hope this helps others in similar, or more especially analogous cases. That is, I hope that the approach and thinking I used to figure out the solution can be applied in similar but not identical cases. Then of course, get the article published, so the next poor sap encountering it who tries searching for the error message in Google will find a useful result.

You can see some examples of articles like this below.

Sometimes the errors can be specific to the engine, as I wrote about in this series of articles:






Or they might be driver specific as in these articles:

Active Directory driver:




JDBC Driver:




SAP HR:




Heck even about an error in iManager:



I have a whole bunch more collected, I just need to find the time to write an article around them. Maybe I need to take the bus more often?

I strongly suggest that everyone do this, and heck, if you do not feel up to writing the article, send me the error to add to my collection, since this is the best way to learn that I can think of. But this way, everyone who ever experiences that same error can hopefully find some useful information about it.

The other day I was deploying a bunch of new drivers and ran into an error I have never encountered before. Alas, because I was working on a VM at a remote client through RDP to VNC via a VPN all over Citrix, I could not easily get the error message out of the system. (Ya, I hate working this way, but you do whatcha gotta do some days).

When I went to deploy the drivers I got some very strange results. There was an error in Designer that I had never seen before, and I regret not capturing.

However, when I ignored them (Well you know, sometimes the errors are just cosmetic, and one can dream, that it was just cosmetic), and tried to start the driver. However it very quickly errored and the driver would not start.

The error looked like this below:

[12/27/13 14:18:17.459]:SAP-BizLogic :Found subscriber IDM\Services\driverSet\SAP-BizLogic_2.
[12/27/13 14:18:17.460]:SAP-BizLogic :Found subscriber IDM\Services\driverSet\SAP-BizLogic\Subscriber.
[12/27/13 14:18:17.460]:SAP-BizLogic :
DirXML Log Event -------------------
Status: Error
Message: Code(-9071) Driver is misconfigured: Code(-9007) Multiple DirXML-Subscriber objects were found under the DirXML-Driver object.
[12/27/13 14:18:17.461]:SAP-BizLogic :
DirXML Log Event -------------------
Status: Error
Message: Code(-9070) Unable to start DirXML driver : com.novell.nds.dirxml.engine.VRDException: Code(-9007) Multiple DirXML-Subscriber objects were found under the DirXML-Driver object.
at com.novell.nds.dirxml.engine.ConfigAbstraction.commonConstruct(ConfigAbstraction.java:535)
at com.novell.nds.dirxml.engine.ConfigAbstraction.<init>(ConfigAbstraction.java:397)
at com.novell.nds.dirxml.engine.DriverEntry.<init>(DriverEntry.java:131)
at com.novell.nds.dirxml.engine.DriverEntry.startDriver(DriverEntry.java:108)
at com.novell.nds.dirxml.engine.DriverEventMonitor$DriverStateHandler.handleEvent(DriverEventMonitor.java:274)
at com.novell.nds.events.EventNotification.processEvent(EventNotification.java:845)
at com.novell.nds.events.EventNotification.processEvents(Native Method)
at com.novell.nds.events.EventNotification.access$500(EventNotification.java:56)
at com.novell.nds.events.EventNotification$EventThread.run(EventNotification.java:1149)
at java.lang.Thread.run(Unknown Source)


First thing to realize is that although the Status: is set to Error, it is really a fatal error, mostly because the driver start did not succeed, therefore the driver did not start, and functionally that is similar to a <status level="fatal"> style event but is technically a different case. Normally, if a Status event with a level of fatal happens, the driver will stop. In this case, you get an error in either Designer or iManager that the driver failed to start.

It is nice getting that feedback, instead of starting, getting told by Designer or iManager that it started, and then finding in trace that the driver had a fatal error and stopped silently.

Anyway, in this case, you get a fair bit of information in the error. The core is the error code -9071. I look on the documentation page for this error, as is a good first step:

Novell Error Codes

However, this error set is not there. No 9071 to be found, and the 9007 error it has listed is for GroupWise. Ah well, worth a shot, checking the docs. However, this code turns out to have a description with it, so that helps even more. The text says:

Code(-9071) Driver is misconfigured: Code(-9007) Multiple DirXML-Subscriber objects were found under the DirXML-Driver object.

More than one DirXML-Subscriber object under the DirXML-Driver object? What on earth? How could I do that? What does that even mean?

Well a little bit of history is needed here, to understand how a driver object in eDirectory is stored. You can read a fair bit more on this topic in this article:



The original approach for DirXML, was you have DirXML-DriverSet object. Under it you have any number of DirXML-Driver objects. That object would have an attribute DirXML-InputTransformation, and DirXML-OutputTransformation, which are DN references to the first policy or stylesheet object referenced. Similarly, there would be a DirXML-Subscriber and DirXML-Publisher object each of with have the attributes:
DirXML-EventTransformationRule
DirXML-CommandTransformationRule
DirXML-MatchRule
DirXML-CreateRule
DirXML-PlacementRule

Each of those single valued DN reference attributes would point at the first object (Policy or Stylesheet) that was first in line. Then each policy or stylesheet object would have a DirXML-NextTransformation attribute that would point to the next object in the list.

However, this approach changed a bit with IDM 3.5, because there was one major flaw. There was no good way to reuse a policy object. That is, if you reused policy, it would link to the rest of the list of policies, which was probably not what you meant.

I.e. If you had one policy you wanted to reuse in the Input Transform policy set and say the Command transform, if you linked it in twice, you would 'inherit' the rest of the queue and there was no way to do it.

With IDM 3.5 and higher, the policy linkage model was totally revamped and the old attributes were basically retired and stopped being used, being replaced by the single attribute DirXML-Policies, a multivalued Typed Name attribute. Typed Name attributes have a DN reference, and then 2 integers.

They took the Fishbone diagram, assigned each place you might link in an object (policy sets, Global Configuration objects, ECMA Script objects, etc) an integer value starting at 0, all the way up to 14. (In IDM 4.02 Patch 3, they added 15 and 16 for startup and shutdown policy sets and if you need a laugh, look at this error I ran into, by doing a series of dumb things, that exposed it nicely: http://www.netiq.com/communities/cool-solutions/bidir-edir-driver-idm-array-index-out-of-bounds-error )

Thus the first integer for each DN is the placeholder integer value. The second integer tells you its position within that policy set, thus ordering them properly.

This is what allowed the use of Library objects to store policy objects and properly link them, and reuse policies and stylesheets. Also later in IDM 4 this made managing packages a lot easier as well, since they could simply extend the model to include a placeholder for GCV objects and whatnot as needed for new features/

However, they did NOT deprecate the container objects, DirXML-Subscriber and DirXML-Publisher, since logically, it was still helpful to be able to segregate the policies into three different containers (Driver itself, then the two sub-containers)

In my case, if you look a little higher up in the trace (where I started the snipping) you can see this message:
[12/27/13 14:18:17.459]:SAP-BizLogic :Found subscriber IDM\Services\driverSet\SAP-BizLogic_2.
[12/27/13 14:18:17.460]:SAP-BizLogic :Found subscriber IDM\Services\driverSet\SAP-BizLogic\Subscriber.


So the error says, found multiple subscriber objects, and lo and behold, yep it found two of them. One named 0_2, and another named Subscriber (as it should have been). That is interesting. Even more so, each contained some policies under them. That is, of the say 10 policies expected in the Subscriber container, 8 were in eDirectory under Subscriber, and 2 were under the 0_2 container.

If you have never seen eDirectory replication error objects, then welcome to world of problematic replication. When you get a replication collision, you create an object on Replica 1, then before it synchronizes you write it again on Replica 2, then they replicate between the replicas and you have a collision. How to handle it?

Well eDirectory takes the one with the earliest timestamp and sequence number (Since timestamp is only seconds granularity, and you might have multiple events a second) and keeps that, and renames the second one to the X_Y format. The more the collisions, the higher the numbers. I remember there is actually a meaning to each number, but I forget the details, and I cannot find it easily.

Anyway, I had noticed in this tree when I was deploying the driver I was taking forever, pausing after each object to report, waiting for the object to replicate to the other server. This became a real issue during the first half of the roll out where it was taking several hours to import and deploy 3 drivers. (Some of that was the VM I was working on, and a reboot helped somewhat, it was still slow for the second half, but much faster)

I was trying to think about why Designer would be waiting for the object to replicate to the other server. The best I can come up with is that there are some Per Replica attributes on the DirXML-Subscriber class object. I took a look in schema, and would you look at that, there they are:
DirXML-LastLogTime
DirXML-StatusLog
DirXML-Timestamp
DirXML-Timestamp2

Per Replica flagged attributes are really quite goofy in some ways. Remember, eDirectory is a distributed replicated database, that can flag an attribute as do not replicate. That seems a bit backwards. Why ever they added this to schema I have no idea what use case they had in mind, but it turns out to be pretty useful for IDM. All the connection info, named passwords, etc are Per Replica attributes on the driver object, so that ServerA can point its drivers at a server in the same data center, but ServerB perhaps hosted in the DR site, can point its drivers at servers in the DR site, so that in case of failover, the different configurations, for the same driver, stored in the same attributes will work. It seems a bit goofy at first but it really does seem to work. It does mean, it is possible to lose information if you only have a single replica of the set since the different connection information is not replicated and is only stored on the servers replica.

The fix is pretty easy. Go to each driver with this issue, and there is the 0_2 or 0_4 object of issue. Delete its contents and then object itself via iManager, LDIF, ApacheDS, ConsoleOne, or whatever tool you like. Then redeploy from Designer. This time, the Subscriber object already exists and is replicated and all should be well.

The reason you have to delete the contents first is that prior to eDirectory 8.8 SP8 you could not issue a subtree delete command per se. Tools often can fake it, by choosing to get the list of children, delete them, and then delete the parent. But in eDir 8.8 SP8 they added an LDAP extension to allow a single delete request to delete all children as well. A colleague of mine tested this with a million objects and while it still took a couple of hours on his test VM at home, it completed much faster than a million and one (objects parent) delete operations.

Anyway, I thought this was an interesting error to share, that elucidates some of the underpinnings of how IDM works and can help with future troubleshooting. If you find an interesting error in trace, please take a copy of the message, and share it around in an article like this.

Labels:

How To-Best Practice
Comment List
Related
Recommended