Data Collection Service Driver Walkthrough - Part 8

With Novell Identity Manager 4.0 there are a number of new features available. You can read more about those features in these articles:

There are four new drivers, two are for new connected system supported ( and Sharepoint) and two are used as service drivers that are needed for the Reporting module.

These two drivers are the Managed System Gateway (MSG) driver, and Data Collection Service (DCS) driver, the first of which you can read about in this series of articles:

The Data Collection Service driver, is the second half of that pairing, and will be the topic of this series. Both these drivers are meant to enable the Reporting module to get enough information about the system to report upon it. The MSG driver is focused more on providing information about how the drivers are configured, heck it even tries to infer the matching rule criteria by reading the rules out of the objects, and the DCS driver is focused more on collecting events about objects for storage in the Reporting database.

I thought I had finished this all up, but I had been browsing the Packages in a fresh project, since that seems like the only time you get to see the Readme option. In order to see it, you need to select the Package Catalog from the Outline view, right mouse click, and select Import Package.

The user interface at that point has a Show Readme option, which opens a small side bar that has the Readme contents for the package. I find it odd that it is not so easy to see this when adding a new driver via Packages. It seems like I would want to be able to see this info anytime I need to select a Package, not just when I import them the first time into my Package Catalog.

Initially I found this because Designer on a restart asked to look for new Packages, and when I let it, found a whole stack of them. I would estimate about a dozen or more Package updates have come out since the release of Identity Manager 4 in Oct of 2010. This is a really good sign, as some are updates, to apply bug fixes, and others are new functionality being added after the fact to drivers.

On one of those updates, I noticed a Data Collection Service Driver Scoping package was available. When I looked at the Readme option, as I went to import it into my Package Catalog, I saw this:


This package provides static and dynamic scoping capabilities for enterprise environments with multiple driversets and multiple pairs of Data Collection Service Drivers and Managed System Gateway Drivers.

During or after installation, the role for the Data Collection Service Driver the package is being installed on must be selected:

The driver synchronizes everything except subtrees of other driver sets. A primary Data Collection Service Driver may well service a whole identity vault or it may work in conjunction with one or multiple secondary drivers.

The driver synchronizes only its own driver set but nothing else. A secondary Data Collection Service Driver usually requires a primary driver to run in a different driverset or no data outside the local driver set is sent to the Data Collection Service.

Allows the administrator to define custom scoping rules. The only implicit scope is the local driver set, everything else is considered out-of-scope unless it is explicitly added to the list of custom scopes. A custom scope is the distinguished name in slash format of a container in the identity vault whose subordinates or subtree should be synchronized.

What's New

- Novell Identity Manager 4.0 FCS (via online update) release

Now that's interesting! A distributed model, with a single Primary, or a Primary and distributed secondaries. I am very curious as to how they do this!

I had written these notes down, and they reminded me to go look and see how this actually works under the covers. But when I went to go find the Primary/Secondary setting I could not find it in my driver I had imported and been using for this series of articles. So I double checked in case it was actually on the Managed System Gateway driver, which you can read more about using the links at the top of this article. None the less, not there either. I was sure I had seen this text, I had recorded it in my notes, yet I could not find the settings.

Since I knew that I could only see the Readme's when importing into a Package Catalog, I went to a spare project (actually an IDM 3.6.1 project so of course, I had not imported any packages into the catalog there) and opened up the Import Package dialog.

Looking at the Data Collection Service driver, this text was not in its readme. However, there is a tick box at the bottom, Show only Base Packages that defaults to on. This is probably wise, since when you decide to add a driver, you usually want to start by selecting from the Base packages, and select later from all the available sub-packages. However in this case, I wanted to see them all.

There it was, the Data Collection Service Driver Scoping package. Looking at the readme, there was the text I was expecting. Much better, I am not loosing my mind (yet!).

Well its a package, so lets go to the Data Collection Service (DCS) driver, open the driver properties, and select the Packages link on the left column. This lets you see the current packages this driver is using. Additionally, there is in the upper right hand corner a Plus sign for adding new packages. This is the place, where if an updated package was available, you could choose to upgrade the current driver to use it. Then if you realized you did not like the new functionality, you could downgrade back to a previous version. Packages are great!

Looks like this Package adds at least two things I have found so far. A Global Configuration Variable (GCV) object, called Scoping, and a policy object in the Subscriber Event Transformation Policy set named:


The GCV is sort of interesting as it is an 'enum' type GCV, where you get to select from a list of values that are prettied up. The choices are Primary, Secondary, and Custom. The real values are primary, secondary, and custom (basically all lower case versions). If you select Custom, you get shown a structured GCV which is one of my favorite types.

This structured GCV has two GCV's defined inside it, one for the DN of the subtree this driver should be reporting on, and the second for a Scope value, which can be base (value of base), subordinates (one), and subtree (sub). Thus you can select multiple objects or containers and have this driver report on just those.

As always, the configuration is defined in the GCV's, but the work is done in the policy. So lets look at that Subscriber Event Transform policy object and we see four rules.

Initialize scope rule (Primary):

The conditions are interesting. It starts with a check if the above GCV is set to Primary and then validates that it has not been run through before (it is an initialization rule) by checking for the availability of one of its local variables (NOVLDCSSCPNG__EXCLUDE__SUB).

In the policy it sets the NOVLDCSSCPNG__EXCLUDE__SUB to an nodeset that starts as <roots/> then gets the driverSetDN by chopping the last node of the GCV value.

Next up it loops through all DirXML-DriverSet objects in the tree, and if the XPATH of $current-node/@src-dn!=$driverSetDN is true (that the src-dn of this Driver Set object is NOT our drivers driver set) then it will add a node <root> to the NOVLDCSSCPNG__EXCLUDE__SUB variable under the <roots> object, with the text value of the current driver set object's DN.

Thus it builds a list of driver sets to ignore. Very nice approach. I would probably have used an Append XML node and set XML text tokens to update the NOVLDCSSCPNG__EXCLUDE__SUB variable, but this approach of building it as a text string and the XML Parse'ing it and then copying it into the original variable with a Clone By XPATH token, will work much the same way. I am not sure if there is a real performance issue here, and since there are unlikely to be more than one or a few driver sets per eDirectory tree, I am not concerned.

For the Secondary and Custom case, some initialization is needed.

Three variables are set, NOVLDCSSCPNG__ ending in BASE, ONE and SUB. These all get initialized as <singles/>, <parents/>. and <roots/> nodes respectively. The Driver Set DN is retrieved the same as above, and the NOVLDCSSCPNG__SUB variable gets the current driver set added as a <root> node.

In the custom case, it now parses the Structured GCV values. This actually demonstrates that the author of this rule is either unaware of how a Structured GCV looks when treated as a node set, or chose to ignore it, If they had set the value into a node set local variable, they could loop through it and use XPATH to select what they wanted. When you treat it in a string context, you get a delimited string that you then have to tease into its parts.

Basically the delimiter between instances is a plus sign ( ) and the delimiter between values inside each instance of the structured GCV is the equal sign (=). Thus they use the Split token to generate a nodeset to loop through using the plus sign as the delimiter. Then inside each instance, pick apart the two GCVs that are values (the DN and the scope) separated by equal signs. So substring before the equals sign is the objectDN, and substring after the equal sign is the scope value.

Then for each of the three possible scope values, base, one, or sub, there is a task to perform. They all come done to the same thing. Add the objectDN text as the node in the matching NOVLDCSSCPNG__* variable so we know what to be excluded. This follows the same approach as described above.

Next up we have two rules, to handle the Primary case and then the Secondary or Custom case.

When it is a primary server (via the GCV setting), the rule looks only at the NOVLDCSSCPNG__EXCLUDE__SUB variable. It loops through the values in it, which is a list of DN's and checks if source DN is in subtree $current-node$. This is a reasonable approach, because the 'in subtree' test is pretty clever. It handles case issues, and more than one level deep. The alternate approach is to try a contains($SRC-DN,$current-node) style XPATH test. This however is case sensitive across the entire string, which might mean just using the lower case token on both variables first before comparing.

Its a shame there is no easy way to compare a nodeset of values to a DN and return if it is in any of the subtrees. That would be really handy to have. You can use the if local variable test, to test a single string against a nodeset of values. If it matches any of the values, then the test returns true. If it matches none it returns false. The nice part is that this would be a case insensitive test, but for DN's spacing can be a real issue as well, especially in LDAP format.

Regardless you can see that the If Source DN in subtree test is already a simpler approach.

It is interesting to note that the action in that case is to both veto() and break() which is unnecessary, and probably a typo as much as anything else.

For the secondary and custom driver configurations, the driver loops through the other variables built in the previous rules, and does the same test. This time it specifically uses break() instead of veto, since if it is in any of the scopes, we want to let it through. If it gets through all three variables, then it is vetoed, since it is not within any of the specified scopes.

In some of the other drivers, a minor tweak is needed in the Input transform. For the Active Directory driver we have: NOVLDATACOLL-itp-DataCollectionQuerySupport

This rule looks for queries with the operation property data-collection-query, and then it reformats the type="dn" which tells the engine to look for associated objects and changes it to say {dn} instead, and removes the association-ref XML attribute.

Basically this is needed so that the non associated objects do not get stripped out of the query results, This can be very frustrating when an attribute you really just care about as say a string, keeps getting stripped, because it is unassociated to the driver.

I ran into this in my plural DirXML-ADContexts and DirXML-ADAliasNames, since I made them Path syntax, where the DN component is the Drivers DN. To get around it, I added an association to the driver object with some bogus value, and in the input transform, added the association-ref to the node with that value. This way the engine let it through. I am sure there is a simple approach, perhaps stuffing the information into operation properties and then putting it back in later.

You can read more about the plural context approach in this series of articles:

There is also a very interesting tweak in the Schema Map policy set for the AD driver. The rule NOVLDATACOLL-smp-SkipSchemaMapping is linked in twice, once before, and once after the schema map rule. Interesting eh?

There are two basic rules, one to hide the data, and one to unhide it. Skip and restore. The basic idea is to NOT Schema map the attributes in a DCS driver query. They want to get back the raw data from these queries, not the eDir name space versions.

The skip rule reformats the attr-name="CN" to look like attr-name="{CN}" so that the schema map rule will ignore it. Then the restore rule sets it back.

The Skip runs before the schema map policy fires, and sets the operation property restore-attr-names to true, and removes the data-collection-query operational property.

Then the Restore rule looks for the restore-attr-names to know to fix them back up.

Very clever approach.

As you have seen in this series, the Data Collection Service driver is quite an interesting approach. I hope you have enjoyed this as much as I did!


How To-Best Practice
Comment List