XPATH to do schema mapping rule

0 Likes
Want to do schema mapping in XPATH?

Novell Identity Manager has a number of usable languages for managing and converting events.



It all started with XSLT (XML Stylesheet language) which was useful, but nowadays has been severely eclipsed by the primary language, DirXML Script. There are many reasons for this, but perhaps the best reason is the ease of troubleshooting and working through issues with DirXML Script, compared with the XSLT.



In DirXML Script, if you enable tracing, each rule along the way shows the before document, the after document, and each token shows in the trace, as it executes. Thus you can see what happens quickly, no need to enable specific debug code, it is just there built in. On top of that, you can turn tracing on or off at the driver level quickly. Just change the value of the DirXML-TraceLevel attribute in eDirectory on the driver object. Does not actually matter how you do it (via Console One, LDAP, iManager, Designer, or even dxcmd), and it happens live. This is great as there is a definite performance penalty to running with trace on. But it is worth it most of the time. Then with Identity Manager 3.5 a new option on ever token came around, the ability to disable tracing individual tracing.



On top of all that DirXML Script is the direction Novell is moving with the product, and DirXML Script is considered to actually be faster than the XSLT implementation!



For more information on reading and understand DSTrace you can read the following excellent articles:








Both DirXML Script and XSLT inherit an additional language, XPATH, the XML Path language, which turns out to be quite powerful, often in ways you would not expect!



However XPATH is probably the hardest aspect of Novell's Identity Manager product for most people to understand. I have been working on articles that work through interesting examples of XPATH to try and provide some real world examples that may make it more understandable. Here is what I have written so far, specifically about XPATH:



XPATH Concepts:











XPATH Cool tips:








On top of that, I often find myself spending a lot of time explaining XPATH in other articles like in this article on the Attribute tokens (Attribute, Source Attribute, Operational Attribute, Destination Attribute) where I also list the XPATH equivalents to the existing tokens:





I have been working on a series of articles taking apart other default driver configurations, most recently the Compliance Management Platform (CMP) version of some common drivers, and tracking them at:
Detailed driver walk through collection



The SAP HR and SAP Business Logic drivers in the CMP versions have lots and lots of interesting XPATH in them, and you can read these specific articles for some of those details:






Recently I ran into the need to do some fun things in XPATH, and it turns out it was easier than I expected, and while it seemed hard in principle, it turns out to be really easy, straight forward, and understandable in practice!



I was working on a SOAP driver talking to Salesforce.com, where there was really no Application Schema available. I actually wrote a policy to read back the SOAP schema, and convert it (for the most part) into the DirXML-ApplicationSchema attributes format, but in the end I did not use it, as there is so MUCH schema in this system that it was killing performance on looking at the Driver object, and not providing much help, so I stopped. (That was another fun XPATH exercise, but very specific to that one SOAP example, and mostly brute force, and not really elegant).


Thus I needed to manage five different attribute syntaxes, Strings, Distinguished Names (DN which IDM likes to consider as com\Acme\Users\Geoffc), Date (01-27-2010), Date Time (01-27-2010 00:12:23.123Z) and List (Which is seems to be the only way the connected system, Salesforce.com handles multi valued attributes) between eDirectory attributes and Salesforce.com attributes.



Time is the most common problem to need to be converted. eDirectory has a Time syntax that will probably have to be revamped seriously, in the near future, since it uses a 32 bit integer to count seconds since the very beginning on 1970. There are different ways of reading that value, as signed or unsigned (unsigned means 4 billion positive numbers, where signed would suggest 2 billion of so positive values and 2 billion negative values). The main problem is 2 billion seconds since 1970 runs out sometime in 2037. Thus we will have a Y2K37 problem coming up.



Other systems use a count of 100 nanoseconds (so tenths of milliseconds) since the beginning of the year 1601 and using a 64 bit integer, which has a lot of room, since a 64 bit number is so amazingly big, it is hard to imagine it ever running out! (Well not in my lifetime anyway!)



You will see lots of rules in drivers converting time. The Convert Time token is so nice and easy to use, combined with Reformat Operational Attribute that it is almost a non issue now to manage this. The Active Directory driver uses a Java class call to do this, and continues to do so, since it works fine, but really has been superceded by the Convert time token these days.



DN syntax is a tricky one, as we want to store DN values in eDirectory, but Salesforce.com which has a nice Reference syntax that basically is the same notion, is based on a 18 character Id value. Luckily, that is the Association value I used in this driver, since it is really the only unique value in their system that is usable, and we also store in an Id attribute of each object, so the conversion works nicely.



However, all details of HOW we convert from one syntax to the other is almost immaterial to the point, since there is a much bigger problem to be resolved first.

After surveying the schema for attributes we use of each type, I found we had 3 Date Time syntax attributes, 2 Date syntax attributes, 4 list syntax attributes, and 9 DN syntax attributes.



Traditionally, you do the conversion as one off rules, one per attribute, and this would be more than a little bit painful with 18 different attributes needing management and conversion. This would need to be done twice, once in the Input transform and a second time in the Output transform. Thus a generic rule to handle this would be really useful.



Well what do we need to be able to do? Well first off we need to which attributes are of each type.



I made some Global Configuration Variables (GCV) that were of type List, (for more on GCV types, consider reading:







Needed one GCV per syntax type, so I could identify if a specific attribute was of a specific syntax type. Then the same basic process, of testing, for each of the attributes in the document.



This is a cute trick as well. In the Input or Output transform, you want to convert attribute syntax for three different type of events. Add, modify, or instance docs. Instance documents are the results you get back from a query and contain attributes as well.



So you want to loop through all the attributes in the document, and test each one if it matches your special cases.



This turns out to be really easy:



Use a For each to loop through the node set defined by the XPATH of ".//@attr-name"



That means, the current context which is shown as a period (.), then slash slash (//) which means any occurrence in the document, which is a bit expensive in processing, usually, but makes sense here, and finally, the XML attribute inside nodes, called attr-name.



This returns all the attr-name values, which actually works on all three of our cases, even though the parent nodes are different.



A modify event looks like (simplified to make my point):



<modify class-name="User">
<modify-attr attr-name="DateOnSystem">
<remove-value>
<value>10-24-2010</value>
</remove-value>
<add-value>
<value>10-25-2010</value>
</add-value>
</modify-attr>
</modify>



An Add event:



<add class-name="User">
<add-attr attr-name="DateOnSystem">
<value>10-24-2010</value>
</add-attr>
</modify>



A query response in an Instance doc:



<instance class-name="User">
<attr attr-name="DateOnSystem">
<value>10-24-2010</value>
<attr>
</instance>


But the for-each with an XPATH of .//@attr-name will loop through all three of those cases. Ah, gotta love XPATH, and wise XML schema design by the guys at Novell!



Ok, so now the local variable current-node has the attribute name if used in a string context, and I am not sure what nodeset, if used in a nodeset context.



This means you can now do a simple test to see if your current attribute name is in your list, either in XPATH of $current-node=$DN-ATTRS or with a "if local variable" current-node equal to the ~DN-ATTRS~



S

o far so good, not much really here that is cool XPATH. But I actually ran into a different problem. For a variety of reason I developed all my reformatting rules in the Command Transform, which was just plain a mistake. I should have seen it sooner, but did not. Anyway when I wanted to move it to the Input and Output transforms, I realized I needed to operate in the application namespace. In other words, I cannot use Internet EMail address in my GCV, I now need to realize that attribute should be called Client_Email__c and this is case sensitive.



The application schema in Salesforce.com was created in a pretty unplanned manner, and the spelling, capitalization, and naming pattern is all over the place, and horribly typo oriented. Whereas my eDirectory schema attributes are all consistently named, capitalized, and less typo oriented.



Well this is the job of the Schema Map rule right? Well there is no simple way to say, translate this attribute name, from Application to eDirectory namespace, or the converse. That would actually be a pretty cool token.



But here is where XPATH comes in handy. It turns out, you can quite trivially do it in XPATH, if you load the Schema Map into a local variable.



The good news is that all the 'stuff' in Identity Manager is stored as attributes of objects in eDirectory. That means you can use a Source or Destination token, with the DN of the driver, (There is a convenient GCV dirxml.auto.driverdn that always has this drivers DN in it), looking for the attribute XmlData, use the Base 64 decode token, and then XML Parse, and suddenly you have a variable with the entire Schema map XML in it! That sounds complicated, but here is the Source Attr version:



<do-set-local-variable name="SCHEMA-MAP" scope="policy">
<arg-node-set>
<token-xml-parse>
<token-base64-decode>
<token-src-attr class-name="DirXML-Driver" name="XmlData">
<arg-dn>
<token-global-variable name="dirxml.auto.driverdn"/>
</arg-dn>
</token-src-attr>
</token-base64-decode>
</token-xml-parse>
</arg-node-set>
</do-set-local-variable>



Then you have some simple XPATH to do the conversion.



Convert eDir name to App name:

$SCHEMA-MAP/attr-name-map/attr-name[app-name/text()=string($current-node)]/nds-name/text()



Convert App Name to eDir name:

$SCHEMA-MAP/attr-name-map/attr-name[nds-name/text()=string($current-node)]/app-name/text()



Thats all it takes, for the most part! Cool eh?



Lets parse that to explain what is going on.



You do need to understand what the XML in the Schema Map object looks like, here is a simple sample:



<attr-name-map>
<class-name>
<app-name>WorkOrder</app-name>
<nds-name>DirXML-WorkOrder</nds-name>
</class-name>
<attr-name class-name="DirXML-WorkOrder">
<app-name>Contact</app-name>
<nds-name>DirXML-nwoContactName</nds-name>
</attr-name>
</attr-name-map>



So inside our SCHEMA-MAP variable, in the attr-name-map node, in the child attr-name node, whose (This is the predicate) child node app-name's text value is equal to our current-node, then select the attr-name's child node nds-name's text value.



Reverse app-name and nds-name and you go the other way.



There are a number of issues with this simplistic approach (like if the same attribute has different definitions in the map based on object class), but those are an easy thing to add, just modify the predicate to require the class-name= whatever as well. There is an additional problem, with Non-class-specific Mapping, which is missing the class-name XML attribute on the <attr-name> node. But you can work around all these as well.



Now when I load up my GCV values into variables, I can for-each through the GCV nodeset, and add a value to a variable that is in the correct namespace.



I thought this was pretty cool and simple. I expected it to be much harder than that, but in the end it really was not!


Labels:

How To-Best Practice
Comment List
Related
Recommended