Let's talk about DirXML-Associations

1 Likes

Novell has done a pretty good job in the forums, organizing volunteer (what used to be called Sysops, but in this "i-everything" or "e-everything" age, the name had to change and are now called Novell Knowledge Partners, or NKPs) to try and either answer the questions themselves, or try to find someone they know who can help. If you have not used the forums when you need help, I highly recommend it! Search first, ask questions second, and provide details and trace in your questions.



I regularly read and post in the Identity Manager forums (as do a number of other helpful people) and often can help people out there in a fun way. This article was a post I wrote to answer a question, and I realized would make a good standalone article for Cool Solutions.



The forums are available at http://forums.novell.com over a silly vBulletin web interface (Icky! Works well when Google searching to show the results), but I prefer to use NNTP the real protocol for newsgroups dang it! Use your GroupWise client, Thunderbird, tin, nm, whatever you like, and point it at nntp://forums.novell.com and look for the novell.support.identity-manager.engine-drivers forum.



A user in the forums asked a question a little while back:



I am utterly confused.. and have a very basic question.
Whenever I set an association while on the subscriber channel with the source object, will the association be with the driver? If yes, then how do we make sure that the same destination row will be hit each time I modify the user object? (If I talk bout synchronizing data)


I thought that maybe an explanation of the Association values use is pertinent here.



So you have two (or more, more is left as an exercise to the reader) connected systems. You have an object in each system and the porpoise of IDM is to link the two of them.



Thus we have a matching policy to decide who is matched, and failing a match, a create to determine rules to allow creation, and then a placement policy to tell us where to place them for creates.



If you have not already, please read David Gersic's excellent series of articles that walks through this process, step by step through the process flow:







But once they are created, it would be needlessly inefficient to match on every event. Imagine the horrible overhead! Queries into other systems are 'slow' in the grand scheme of things, and costly. Watch it in trace sometime. You will often see second long delays, depending on the system for queries to respond. Plus the query document needs to be processed through the driver rules as well. Now to be fair, some systems are slower than others, and eDirectory and Active Directory drivers are usually pretty quick for queries. Nonetheless in general, if you can avoid needless queries, it is better.



Interestingly enough, the engine is pretty darn smart about reusing the data in previous query (aka caching it) and also about reading ahead in the policy object, to bundle upcoming queries for attributes, into one single query event, to pre load the cache. It is sort of disconcerting watching in Dstrace, when a query for an attribute you know is expected, actually queries for three or four other attributes. Until I realized what was going on, I could not understand why it was querying for these extra attributes when all it needed was a single one for the current rule.



But since much of the overhead is in the connection, crossing the system boundaries, etc, it turns out that the actual retrieval of the data, once the target user is found is usually quite quick. Thus this is a really nice efficiency optimization. Imagine the case of an LDAP query, that needed to first find the object. Then retrieve either a single attribute, or three. The cost in terms of time to find the object is the same (usually, discounting any caching the LDAP server might have done for previous queries) and retrieving one or three attributes is barely different. Thus you save the extra two instances of the search overhead.



Sometimes you want to get the data again, not using the cache, which can interesting, and I talk about that more in this series on the use of the Destination Attribute, Source Attribute, Operation Attribute.




Therefore, in order to avoid this costly situation we need to store something that identifies the users in both systems.



In an ideal world it would be a two way link, with some kind of attribute on both sides of the fence. (The eDir-eDir driver, actually does do that! There is an Assoc value on the objects on both sides of the directory). That would help makes things faster!



Alas, it is hard enough getting the AD guys to let us install a Remote Loader and the password sync filters, get the Domino guys let us install ndsrep.exe on their Domino servers, get the AS400 guys to let us install the Remote Loader on the AS400 and so on, let alone extend their schema and start storing data on every single object we use. Like thats going to happen. On a side note, what is up with Active Directory admins? Why are they so uptight about schema! Even a simple schema change is a big deal with them.



So assume for a moment that a two way link is like the solid gold potty of Austin Powers fame. We all want one, but it just ain't in the cards baby.



What is our next best option? Well we are enlightened people, using eDirectory where schema is meant to be useful and used, not feared and locked up in the monastery. So there is an eDir attribute called DirXML-Associations.



Next, we have a bit of info we need to store. One of the many neat things about eDirectory are the interesting schema syntaxes that exist. They are pre defined, and as far as I know, there is no way for us, mere peons to add new schema syntax types.



In this case, DirXML-Associations is using a syntax type called Path syntax, which was designed to describe a file, in the file system. You can read more about some of the interesting eDirectory attribute syntax types that are available in these articles I wrote:






State: the 0,1,2,3,4,5 which indicates what state the association is in. 1 is what you want to see, 0 means ignore, the rest are no longer very relevant in IDM 3.5 and higher. This is the nameSpace component of the attribute syntax and was meant to represent the name space number (DOS, OS2, MAC, etc were assigned numbers starting at 0) but is really a 32 bit integer field.



Driver DN: This allows us to have more than one association per object. As each association has a per driver instance, via DN reference to the driver. It does mean you should only have one association per driver, but the schema does not enforce that, you could by hand set multiple DirXML-Association values, on one object. I am not sure why you would do it, and I am can pretty much guarantee it will break Identity Manager for that user, but you could!



This is the component called volume in the attribute syntax and is meant to represent the DN of the volume object holding the file.



Assoc Value: This is the unique identifier in the other system (not eDir) that allows us to skip matching each time and tell the other system, give me this guy please! Or modify this specific guy. It needs to be something unique, because that means with a single search we can definitely find the other object in that target system.



Now each connected system is very different and has different ideas about what is the truly unique identifier. From Lotus Notes, where every document has a UNID (Universal ID) that is a 32 character hex string. (Hmm, 16 to the power of 32 is about 3.4 time ten to the 38 possible values, or about a 128 bit counter? Because 16 to the 32 is about like saying 2 to the 4 to the 32, which is 2 to the 128. Thats a lot!) Active Directory uses the 128 bit GUID that Active Directory maintains. Whereas a system like Unix or Linux running NIS or NIS drivers, can only really use the object name as the unique identifier. Older systems have this issue. The AS400 drivers, the mainframe drivers, usually reference the name of the object as the unique identifier.



With so many drivers out there, and each one having a different notion of what is a unique value, I started this article, that quickly grew out of control (but in a good way!!) to try and get all the known patterns into a single location to make looking it up easier.





Now with that preamble out of the way lets try and answer your questions:



So the association value is stored on the object (User in this case) in eDirectory as the multi valued and multi part (aka structured) attribute DirXML-Associations. There is one association per object, for each driver to which they are associated. It is stored on the user, per driver.



The reference for a simple JDBC driver is typically the Row, in the table, in the schema, as you have posted in trace before in this thread. (Lets focus on the simple case first).



Thus a Sub channel event (Change of Last name for example in eDir) would send a modify to the Row referenced by the association value, and ask the DB to change the Last Name for that Row.



Going the other way, on the Pub channel, a change in the DB, detected either by triggerless mode's poll cycle looking at timestamps of values in rows and columns, or by a trigger setup in the DB to event upon the change to the Row's Last name column value would be detected and sent to the driver shim, including the information of which row, table, schema it came from.



The engine picks up this value (You should see a node that looks like

<association>USERID=123,TABLE=tableName,SCHEMA=schemaName</association>

in the event before it completes the Pub-Event Transform, to show what the driver shim thinks the reference to the changed object is.



If the engine cannot find this reference in eDir on a User, then it is an operation on an un-associated object and runs through Match, Create, and then Placement policies.



If in the Matching rule it does find an object it skips ahead to the Pub-Command Transform (having previously completed the Event transform).



Ok, so now on to your more complex DB case.



Ok, in my case I have 2 tables, one which stores his data and has a primary key, the other one has a foreign key dependency on the first one and stores the role. I am detecting change on this second one which doesn't have a primary key. Also, I update them by hitting dest data store directly (with dest command processor).. I think I should have told these things earlier..Apologies!!


I don't know the answer. (Those who can do, those who can't ice skate... I am a consultant now, and thus more of an idea rat... (Anyone remember that Dilbert episode?))



Can you store the primary key of the second table on the user in eDir as part of the process? I.e. There is a reference in the second table to the primary key of the first table, (aka Foreign key). When that is set, somehow store a reciprocal link on the user in eDir, where schema is flexible and useful?



Perhaps sync that second table as a second set of objects? Dunno what class, heck make one up, or pick one. There are hundreds of interesting object classes in basic eDirectory schema, and even in common schema extensions that you can leverage.



Then references between them can be maintained by the engine?



On a side note, you can execute SQL commands via the driver on the Sub channel pretty trivially using policy if needed: Using the JDBC Driver and Direct SQL



What must change on the User object in eDir, when a change in the 'role' (aka second) table changes?



Walking down the tracks that this train of thought takes, might help out in this case.


Labels:

How To-Best Practice
Comment List
Related
Recommended