Examples of using XPATH in Identity Manager

Novell Identity Manager originally started as Novell DirXML and required all work to be done in XSLT (XML Style sheets). XSLT is powerful language but not my personal favorite language to work with.

With the release of Novell NSure Identity Manager 2.0 we saw the advent of DirXML Script an XML based language designed for the task of managing XML event documents. With each release of Identity Manager since, it has gotten better and better.

Just for the heck of it, I even wrote this article trying to track down what you can only do in XSLT at the moment, with the goal of chipping away at that list, where possible!

Open Call: What Can You Do in XSLT that You Cannot Do in DirXML Script?

There have been new features that make life a lot easier, and new tokens that are very powerful.

The nicest thing about using DirXML Script is that the management tools, iManager with the Identity Manager snapins, or Designer for Identity Manager (an Eclipse based tool for offline editing of a project) parse the XML into a really nice GUI interface that allows you to type it free form in XML, manipulate it in a GUI, or any combination of both. In fact, sometimes, due to the way nested items (if then code blocks, or for each loops) are shown in the GUI it is easier to fix things by switching over to the XML view and working there.

Some examples of the various tokens and things that can be done with DirXML Script are:

One of the languages that has been available inside XSLT and DirXML Script is called XPATH, the XML Path language, which is described here: http://www.w3.org/TR/1999/REC-xpath-19991116

However there is just not enough out there in terms of how to use XPATH in an Identity Manager context for people learning Identity Manager.

I have been working hard on that topic, and you can read some of my articles on the topic at:

XPATH General Concepts:

XPATH Cool tips:

I thought a general article with basic examples of various things you might do in XPATH would be useful, so lets see how this effort goes.

I personally find that seeing the XML document I am trying to apply my XPATH too makes it easier to visualize, understand, and work with. Here is a sample XDS (Novell's XML dialect for Identity). Lets use a simple add document, that has a bunch of interesting things in it.

<nds dtdversion="3.5" ndsversion="8.x">
<product version="">DirXML</product>
<contact>Novell, Inc.</contact>
<add class-name="User" cached-time="20101213201759" event-id="osg900lnx#20101213201759#1#1" qualified-src-dn="O=acme\OU=Users\OU=Applications\CN=listuser" src-dn="\ACME-IDV\acme\Users\Applications\listuser" src-entry-id="37777" dest-dn="\ACME-IDV\acme\Users\Applications\listuser">
<association state="migrate"></association>
<add-attr attr-name="Full Name">
<value timestamp="1292271432#40" type="string">List Server User</value>
<add-attr attr-name="Given Name">
<value timestamp="1292271432#41" type="string">List Server</value>
<add-attr attr-name="L">
<value timestamp="1292271479#1" type="string">Somewhere</value>
<add-attr attr-name="Login Disabled">
<value timestamp="1292271432#25" type="state">false</value>
<add-attr attr-name="nspmDistributionPassword"><!-- content suppressed -->
<add-attr attr-name="Surname">
<value timestamp="1292271432#37" type="string">User</value>

The single most important thing you need to know about XPATH in Identity Manager is the what the current context is currently set too. When you look at web based examples they start showing you XPATH that starts at the very beginning of the XML document, and you start trying that approach, so starting with an XPATH of nds/input/add/@src-dn to try and select the XML Attribute src-dn in the <add> event node, but it does not work.

When you test it in the XPATH builder in Designer, you might or might not get it to work, and these two issues are basically the same. The XPATH builder in Designer is generic and handles XPATH the way it generically can be handled. However, IDM starts its current context for XPATH at the event or operation node. That is the <add>, <modify>, <delete>, <query>, and <instance> nodes. (There are more event documents types, but you get the idea.)

To get the Designer XPATH tool to work right, you need to select the correct current context in the left hand side on the XML source document. You can read about this in more detail in the article:
XPATH and the context node

Once you know that simple fact many things get easier.

Thus in the example I used above, to select the src-dn XML attribute from the <add> node, you could just use the XPATH of:
@src-dn   (Would return \ACME-IDV\acme\Users\Applications\listuser )

Similarly, some common XML attributes you might want to pick out of the document.
@dest-dn		(Would return \ACME-IDV\acme\Users\Applications\listuser )
@event-id (Would return osg900lnx#20101213201759#1#1 )
@cached-time (Would return 20101213201759 )
@qualified-src-dn (Would return O=acme\OU=Users\OU=Applications\CN=listuser )

If you need to test them in a condition block, there is actually a DirXML Script token, If XML attribute, and you can test for available, equal, and their opposites. No need to even consider XPATH in this case.

That is another key thing to know about XPATH. Are you selecting? Or testing for true or false? Or are you doing math? These are different approaches and can be similar or different depending on the circumstances.

Selecting will usually take the node you have specified and down the tree, assuming you have somewhere to store it (like a local variable set to type nodeset) that is appropriate. Where as testing will return true if there are any values at all, and false if nothing matches the test. Performing functions (which can actually be string manipulation or numeric math) looks different as well.

You could easily select any particular attribute value with an XPATH of something like:

add-attr[@attr-name="Login Disabled"]/value  

which would return false. You might want to add a /text() at the end to be sure to cast it as a string, but not really needed.

For fun, you could select all nodes that have a syntax type of say, string, with the following XPATH.

That says, all <add-attr> nodes, whose <value> subnode, has the XML attribute type set to string.

It also turns out a lot of the built in DirXML Script tokens can be replicated in XPATH. In fact it was probably the other way around, where many common tasks that were being done very often in XPATH were bundled into built in functionality.

The first example to work through are the four Attribute tokens.

Source Attribute
Destination Attribute
Operational Attribute

Source Attribute reads from the source system, depending on the channel it is called in, that is either eDirectory (when called on the Subscriber channel) or the connected system (when called on the Publisher channel).

This performs a query, and the XPATH to do it would look something like this:
query:readObject($srcQueryProcessor, association, @src-dn, @class-name, 'CN')/attr/value 

This uses the Java query function that Novell provides as part of IDM. It tries to find the user by the association value, for the attribute CN.

Destination Attribute is much the same as Source Attribute it just looks in the other direction. That would be the connected system on the Subscriber channel, and eDirectory if called on the Publisher channel.

query:readObject($destQueryProcessor, association, @dest-dn, @class-name, 'CN')/attr/value

This makes a call to the destQueryProcessor instead of the srcQueryProcessor that is defined by default by the engine.

Next up is Operational Attribute which looks at the current event document and pulls the value out of the document. This is a little trickier as it could be an <add> a <modify>, or an <instance> event and the way the values are represented in the XDS document as slightly different. The XPATH would be:

((add-attr|attr)[@attr-name = 'CN']) | (modify-attr[@attr-name = 'CN']/add-value)

This looks for an <add-attr> (from an <add> event) or an <attr> node (from an <instance> document), using the pipe (|) symbol for OR, where the XML attribute in that node attr-name is CN. (The square brackets are predicates and let you specify conditions on the node you are trying to match. Then there is another OR symbol because it might also a <modify> event, where it the node we want would be under a <modify-attr> value with the same predicate of @attr-name='CN' and then an <add-value> node under the <modify-attr>.

You can see this getting a little bit complex and that the Operational attribute token is quite a bit easier than doing it in XPATH. But wait, there's more!

The Attribute token is a pretty darn cool one, since it can use the current documents value (the operational attribute) and if not found, fall back to check the source, and query for the source attribute. Even better, it will also read out of the cache, that persists for the duration of the rule. You may not have noticed but the engine will look ahead in your policy and if it sees that you are going to query on the same object for different attributes it will combine them into a single query event, which is way more efficient than say three separate queries for different attributes on the same object.

The data it reads out is cached for the duration of the Policy object, and the Attribute token will read from that cache, making it even more efficient.

To do this in XPATH, you actually loose the benefit of that caching and read ahead querying, alas. But the XPATH is pretty cool, since it is really a combination of the Operational attribute XPATH and Source Attribute XPATH leading us too:

(((add-attr|attr)[@attr-name = 'CN']) | (modify-attr[@attr-name = 'CN']/add-value)) | query:readObject($srcQueryProcessor, association, @src-dn, @class-name, 'CN')/attr/value)

That's easy to read right? Well it is if you realize if is just the two XPATH statements for Operational Attribute and Source Attribute combined inside a pair of round brackets with a pipe between them.

I have written up some much more complex examples, like doing Schema Mapping in XPATH, and Mapping tables in XPATH, and you can read about those approaches here:

The next thing worth discussing is the notion of functions in XPATH. There are a number of functions available, and the RFC is the reference for finding those functions. There are some common string functions like string-length(), substring-before(), substring() that will let you chop up a string as needed.

An example might be substring-before('geoffc@acme,com','@') to get the first part of an email address.

There are some functions that cast values, like number() and string() that force the data to be of the specified format as best is possible.

You can do the various numeric operations you would expect with addition, subtraction, multiplication, division (integer and floating point), and the greater/less than operators.

There is a cool function called count() that will count the number of nodes that the value in the brackets contains. For example count(attr[@attr-name='CN'] would return the number of <attr attr-name='CN'> nodes. If this returns 0, then that could be treated as a boolean false in a true/false test.

Next up is the use of variables. You can reference local variables (both driver scoped and policy scoped) as well as Global Configuration Variables (GCV) using the $VarName notation. Use a dollar sign to indicate this is a variable. Do recall that variable replacement elsewhere in DirXML Script is a leading AND trailing $ sign, so do not get confused, it is very easy! You can also use the ~GCVName~ notation in XPATH, but it will be replaced at driver start with a literal string, so you might need to surround that in quotation marks if the context requires it.

Thus you could set local variable (type nodeset) TEST to Query for the 'DirXML-Associations' values for an object. You would get back a slightly complicated multi valued attribute that might look something like this:

You could use a for-each loop over that local variable, and then inside the loop the current-node local variable will be the current node you are looping through.

Since the context is still the <instance> node, if more than one object is being returned, you would have multiple <instance> documents, so you would loop through those first, where the current-node would be the <instance> node. Then using another for-each loop, you could loop over the local variable $current-node/attr[@attr-name='DirXML-Associations']/value to loop through all the various values.

Now DirXML-Associations is fun as it is a structured attribute with three components, which you might want to select or test with. So that would mean inside this inner loop, where the current node context is the <value> nodes, of which there will be many, one per driver the object is associated with, you could test if XPATH true
$current-node/component[@name='volume']=$dirxml.auto.driverdn to utilize one of the built in GCVs.

Where things get really complicated is when you combine a couple of predicates and steps along the way. I find it easier to break it up into pieces to try to understand what each piece is doing, one step at a time.

One confusing thing is that you could have a fairly complex XPATH like:

Where there are multiple predicates. I am suggesting a list GCV, inside a GCV group (this is immaterial unless you like to make pretty GCV layouts) that has three different predicates along the way.

I want to look in the DriverConfig local variable, (pretend I read back the drivers DirXML-ConfigValues attribute, base 64 decoded it, and XML Parsed it) and then there is a <definitions> node, and then look for the <group> node that has the XML attribute name of GroupName, and then under that, find the <definition> node, whose XML attribute name is SomeGCV, and since this is a list GCV, it have a <value> node, with the third <item> node. The predicate [3] is really shorthand for [position()=3] which is sort of a weird notation to me, but whatever works. By taking it apart, it is much easier to understand.

As always (like in LDAP filter) breaking up the enclosing brackets will help break it into smaller chunks to understand as well. So our example for Attribute above of:
(((add-attr|attr)[@attr-name = 'CN']) | (modify-attr[@attr-name = 'CN']/add-value)) | query:readObject($srcQueryProcessor, association, @src-dn, @class-name, 'CN')/attr/value)

Could be broken up as:
(add-attr|attr)[@attr-name = 'CN'])
| (modify-attr[@attr-name = 'CN']/add-value)
| query:readObject($srcQueryProcessor, association, @src-dn, @class-name, 'CN')/attr/value

Hmm, that makes more sense to me. You have the add-attr|attr in one test, then OR the modify-attr, and all that is bracket enclosed so it is all those three options OR the query call. Not sure how well that works displayed.

I have been trying when I work on other articles and come across an interesting or complex XPATH to try and take it apart to explain it better, so keep reading my articles to find out more XPATH examples.


How To-Best Practice
Comment List
  • This "(modify-attr[@attr-name = 'CN']/add-value)" might not return a boolean as you expect. Sometime XPath will not do boolean on a string, and you'll have to do something like (string-length(modify-attr[@attr-name = 'CN']/add-value)>0) to get it to work.
  • A great overview of XPATH use in IDM - this certainly got me on the right track.

    I do notice one example missing right after the paragraph starting with:
    "Thus you could set local variable (type nodeset) TEST to Query for the ‘DirXML-Associations’ values for an object."
    Just thought I'd pass that along.

    Thanks again for your perspective on this subject!