Migrating users in IDM

There are many things in Novell (I guess now it is NetIQ, going to take me a while to get used to that) Identity Manager. I have tried to address some of the basic ones, like XPATH Examples of using XPATH in Identity Manager, driver walkthroughs (Detailed driver walkthrough collection) and many many more articles. If you are new to Identity Manager, I would highly recommend you read David Gersic's excellent series on the basics of how events flow through the policies in this series:

Then to understand how to troubleshoot all that, read this series by a guy at Novell Technical Support, which is the best I have ever seen on this topic:

Finally, there is another side to IDM, the Provisioning side, and this introduction to how Provisioning Request Definitions ought to help you get the lay of the land:

But having written all that, the other day there was an interesting article posted that hit a point I realized I wanted to address

In the article Migrating masses of users to IDM the question comes up about how you would turn on an Identity system for the first time. There are many issues involved, but the one this is focused on is the notion of how to get all the users to become a part of the Identity system. In IDM speak, this would be called getting them all associated. The process of getting all the users associated would be migrating them, as the standard approach.

First up, what are associations, and why does it matter? Well one of the interesting things about the approach Novell took to IDM (before NetIQ) is that the meta data about the user, in terms of what systems they are connected too, is stored in eDirectory as an attribute of the user, as opposed to linked tables, or another database. This has its obvious advantages and disadvantages. However over all it is a reasonable system. I have had discussions in both directions, and it may be getting to a point where this is not as scalable as we would like, but for the most part it is working well.

Thus the DirXML-Association attribute on the user, is multi valued (one per connected system) and has three components since it is a Path syntax attribute. It has a 32 bit integer (of which we use 0-5 as common values), a DN syntax part (for storing the DN of the driver object, thus identifying which driver this value is for) and a string, for storing the actual association value.

You can see a list of what each driver considers the association value in the following article: Open Call - IDM Association Values for eDirectory Objects

But at its simplest, it is a unique identifier in the target system. Something that if all else fails will uniquely identify this object in that system. But first we have to get it established as part of turning on the Identity system. Remember that this is per driver, so in a system with 15 connected systems a single user might have as many as a DirXML-Association for each driver. Thus in order to start up your Identity system you might need to get users associated with multiple drivers.

When you look at the Open Call link for DirXML-Association values you will see it is often the GUID in the target system, since GUID by definition means Globally Unique Identifier. As the name says, globally unique. Usually it is a randomly generated number in an 128 bit namespace, which is the sort of name space that has more values than atoms exist in the universe kind of thing. (I may be exaggerating, but you get the idea. 128 bits is a HUGE namespace. 32 bits is 4 billion (4 X 10 ^ 9) so 128 is 4 billion times 4 billion times 4 billion times 4 billion, which is on the order of 10^36 (10 followed by 36 zeros!)). When there is no easily available GUID, there must always be something unique. In a database that is usually the Primary Key in the table and schema of the object. In a system with flat name space for user names, often the user name is sufficient (like in Unix NIS systems). Whatever is chosen, it should not have more than one target in the target system.

The driver shim is responsible for providing an association value, as any events come out of the Publisher channel and for associated users, the engine will provide the correct value from the attribute value, when an event comes from the engine on the Subscriber channel.

You can see it in the trace, as an <association>someValue</association> node as the immediate child of the event node (you know, the <add>, <modify>, etc nodes)

So how do you proceed? You have your newly stood up Identity system, with several drivers and now it is time to get it rolling. Well it depends on the connected systems, to decide how to proceed. Each user in the system will require an appropriate DirXML-Association, one per driver.

Some are easy to 'make' by hand, once you understand what the DirXML-Association value ought to be, and you might just want to construct an LDIF to add them to your existing users. For example, if you have a NIS (Unix based system for distributed authentication) where the association value is the object class then the username. So userGeoffc or maybe groupUsersGroup and you can easily read back the DN's of your users into an LDIF, then add in a line that adds a DirXML-Association value of 1#cn=DriverName,cn=DriverSet,ou=ou,o=o#userGeoffc which is how LDAP shows a structured attribute of Path syntax. As discussed above that says set the value to be state of 1 (associated), for the driver object cn=DriverName,cn=DriverSet,ou=ou,o=o (make sure the DN is valid of it isn't going to work. Watch out, the Driver and Driver Set objects are named by CN= not ou= even though they look like they are containers) and then finally the userGeoffc value.

In the case of the Active Directory driver it is marginally trickier as the association value is not the raw GUID, which is stored as an octet string in AD, but rather a specific representation, where the base 64 decoded value has some of the pairs of digit flipped (endian'ness reversed). I was surprised but there is apparently a standard for this sort of thing and Novell is following it. Seems kind of dumb, but there really is one which you can read about here: http://en.wikipedia.org/wiki/Guid

For example, at a client with 1.5 million objects, we added a new driver for 600,000 of them. Rather than let the engine try to process 600K, we actually scripted retrieving the GUIDs processing them, and then using an LDIF to write the values to the Identity Vault.

In the article being discussed the site 'only' had 10,000 users, so it is not so bad and they could use the migrate function and just let it chug away. In that case, each object (since more than Users or Groups can be synchronized) runs through all the policy in the driver, and the matching rules are used to try and find the matching object.

Now if you watch in DSTrace, you can see how long a user takes to process, alas this is very annoying as just turning on Dstrace slows down the engine significantly. The XML you see in DStrace is stored in memory in a pretty efficient format that the engine can traverse and manage. (Thus XPATH the language is the XML Path language, as it can find the path to the part of the document you care about). Just getting the engine to convert from that format to text to show as XML is expensive in terms of processing power, and you may find that turning trace all the way off (no value, as opposed to 0 which you would think) will actually speed it up quite a bit, as much as a factor of 2-10 times depending on what you do.

I have had a long discussion on how to tune for DSTrace performance that I empirically worked through in a series on Toolkit rules. Those are rules that when triggered look at the data in the directory and try to compare or fix it. (I have used this to fix User Principal Names in Active Directory, or to make sure everybodies uniqueID in eDirectory, really is unique.) However these are very sensitive to having trace enabled. But it turns out since you can turn off trace at many levels, if you can identify the expensive bits. turning trace off there, but leaving it on otherwise can help performance a lot. You can read more of my thoughts on Toolkit rules in this series of articles:

On top of all that, you should realize if you migrate a user in one connected system, odds are good you will cause enough of an event as the data on both sides is updated to match according the rules defined in the filter (Merge Authority in this case) to cause some or all the other drivers to fire. In which case you might want to think about it carefully in terms of overall processing.

Above we talked about generating the association values by hand and just setting them on users. But how would you do it the 'proper' way? Well there you have two options. The most common is via iManager, once you select the Driver object of interest from the Identity Manager overview page. There will be a drop down that allows you to choose Migrate from Identity Vault or Migrate from Application.

These two options let you try and define what to migrate based on either eDirectory (the Identity Vault) or something in the connected system (From Application). While the end result is mostly the same, they both act very differently.

When you Migrate from Identity Vault, you select a list of users or objects, and iManager basically writes a DirXML-Association to the object with a state of 4 (0 is ignore, 1 is associated, 4 is migrate, 3, 5, 6 and the rest of the way to 2^32 are not really used) which causes an event that triggers the driver to process the object.

You will see in the trace that the <association state="migrated">someValue</association> is set, which is how the engine detects this case.

You can manually do this yourself, by modifying the DirXML-Association value for this user to be state of 4, thus you can also script this via LDAP if you would like.

If you migrate FROM application in iManager, the engine submits a query event as a command, (query-ex if the driver supports it, and thus pages according to the ECV (Engine Control Value) controlled default from the ECV named: dirxml.engine.max-migrate-app-count) and the <instance> document that is returned is the list of objects that the engine converts into <sync> events. These <sync> events then run through the Publisher channel and try to match up objects.

Using DXCMD (the cross platform Java command line application) there is a option to Submit a Command, which will accomplish this task in case you wish to script it better. dxcmd expects that you will provide the file name to a text file containing the XDS doc with the query you need and it will process it as if you did it in iManager. So you can have a little better control than iManager allows, but it has to be valid XDS query of course.

This issue came up in a SOAP driver for Salesforce.com, that I discussed in this series of articles:

where using iManager you could not generate the specific query needed to get just the right set of users, but it was very easy to craft a <query> document that would do exactly what was needed. By doing this, it actually made the migrate process an order of magnitude faster, as the simple minded query that iManager generated returned too much data, which had to be converted from SOAP to XDS to then be thrown away and then queried for again. By crafting a smarter query, that only returned the one needed attribute the total time needed changed immensely.

Such a document can be very simple, and might look something like this to just query for all users:

<nds dtdversion="3.5" ndsversion="8.x">
<product version="?.?.?.?">DirXML</product>
<contact>Novell, Inc.</contact>
<query class-name="User" scope="subtree">
<search-class class-name="User"/>

Anything you can do in a query, you should be able to submit in this fashion. My examples of complicated migrates are very specific to an implementation so they are not that interesting in general. But you can watch when you generate a migrate from application in iManager, and capture the query event in trace and try to modify it to do what you need.


How To-Best Practice
Comment List