Using String Compares in XPATH Statements


Novell Identity Manager supports several languages for manipulating events that come across as XDS documents. With DirXML 1.x came the use of XML Style sheets (XSLT) which is still supported. You can still see some rules in XSLT in some of the drivers. With NSure Identity Manager 2.0 came DirXML Script an XML based language that is very well suited for manipulating events.

Common across both languages has been the use of the XML path language, XPATH. One of the biggest issues with using XPATH in Identity Manager installations, is that most documentation and examples are written with a web or HTML focus. That is all fine and good in the abstract, but when trying to apply it to real world examples in the Identity Manager experience, much of the online documentation is not that helpful.

It is always useful to go straight to the source, and look at the RFC for XPATH 1.0. The RFC can be found at:

In order to make XPATH more available and understandable for others, I have been working on a series of articles about interesting things in XPATH, as they relate to Identity Manager. You can see some of the previous articles at:

One of the neat things to do with XPATH is to do some string manipulations that the default DirXML Script token set does not include. For example, DirXML Script includes a substring token, but that is placement based. I.e. Substring from position 3 for 12 positions. This is very useful, but sometimes you want to do a slightly different substring function, say substring before the @ sign in an email address, to get the username part of the email address, or substring after the @ sign to get the domain name for the email address.

XPATH handily enough has a pair of string functions, substring-before() and substring-after() to help out with this. Thus, you could have the email address in a local variable EMAIL-ADDRESS and then call the XPATH functions substring-before($EMAIL-ADDRESS,"@") or substring-after($EMAIL-ADDRESS,"@" to get the information from the email address that we just discussed.

One big issue to watch out for in XPATH is that the functions are case sensitive. We get a little spoiled, since most string compare/tests in DirXML Script default to being case-insensitive. That is, a value of "Brian" is the same as "brian", is the same as "BrAiN" in most cases. In XPATH, this is not the case. Usually to get around this I just upper case the two strings I am about to compare while they are in local variables, just to be on the safe side.

There are a bunch of other string processing functions defined in XPATH 1.0, and some can be very useful, like contains(). A good example on how to use contains() is in this article:
Using Global Configuration Values in XPATH where the idea is to use contains() to recreate the "in subtree" test, that we get in DirXML Script for Source DN and Destination DN. The reason you might need this, is that the "in subtree" test is only applicable to the current objects source or destination DN. What if you had another object, say a group the user is a member of, and you want to test if the group is in a particular subtree. You could set the value of the attribute into a local variable, and then test if XPATH of contains($GROUP-DN,"\ACME\GROUPS\HR") is true or false.

There are many other functions that are useful like:

string() which converts the value you give to a string data type. Usually not necessary, but good to have available. Like number() that I discussed in: XPATH and math it is not always needed, but nice to have!

concat() which concatenates two strings together. Takes two strings, and outputs the concatenated string.

starts-with() is great for finding strings that start with some value. You could use a regular expression compare in DirXML Script and look for the regex token ^something where the carat symbol (^) indicates it needs to be at the beginning of the string.

string-length() is very useful, when you are trying to validate data. If you know the location code from the HR system is always four characters, a very quick test is if the XPATH string-length($LOC-CODE)=4 is true.

normalize-space() is a great function. This replaces multiple white spaces with a single one, and strips off leading and trailing white spaces. Very useful for cleaning up data coming from a database or the like, where a leading space is a valid case, but you would prefer to avoid that.

substring-before(), substring-after(), and contains() we already talked about above.

translate() is a function I do not often used, but it is meant to replace certain characters with others, notionally to allow simple case conversion, but is not sophisticated enough to do it for all languages, and so future support is planned for better case conversion.

One problem with using string functions in XPATH is that you cannot do wildcard compares. That is you cannot compare a $VARNAME=test* and expect it to work. The asterisk (*) is meant for node tests, where a single asterisk (*) means any node of the principal node type of the given axis. For example, child::* matches any element, attribute::* or @* matches any attribute.

In the Identity Manager implementations of XPath (which is 1.0, not XPATH 2.0), you can however use methods from java.lang.String that support regular expressions as extension functions, e.g. modify-attr[jstring:matches(@attr-name,'ACMEPROFILE.*')] (where jstring have been mapped to or modify-attr[java.lang.String:matches(@attr-name,'ACMEPROFILE.*')] This is a DirXML Script provided functionality and is not available in XSLT.

Overall XPATH gets us a lot of functionality, the only real issue is that it is more complex than other approaches. However it is good to have multiple approaches in your toolkit for when you need it!


How To-Best Practice
Comment List
  • Its Pinky and the Brian, Pinky and the Brain. One is a genius, the others insane. They're laboratory mice, their genes have been spliced. they're inky, they're dinky, they're Pinky and the Brain, Brian, Brian, Brain...

    Good catch, my way is funnier though I think! :)

    Spell check? We don't need no steenkin spell check! :)
  • When discussing case insensitivity, you made a mistake in the spelling of brian the last time, spelling it brain instead. Regardless of case sensitivity, brian and brain are not the same.