Adding IDM4 Reporting support to pre-IDM4 drivers - Part 1

Novell Identity Manager 4 has a bunch of new features. You can read what I think of them in this series of articles:

Based on requests from customers, the feature most are interested in is Reporting. That is, most of our customers interested in upgrading to IDM4 are interested in it for the new Reporting features.

Now reporting is an interesting thing. It is at one level simple, at another quite complicated.

Simply put, there are a couple of drivers that are used to get stuff out of the Identity Vault. There is the Event Audit Service that can collect some other event like stuff, and it all gets stored in a Identity Warehouse.

Then you use Jasper iReports (an open source tool, the same idea as Business Objects Enterprise (BOE, the product formerly known as Crystal Reports)) to customize and run reports against the database.

If you have seen Sentinel, and Sentinel Log Manager you will recognize bits and pieces of each blending over.

However, that was the simple version of how it works. When you get down in the weeds of how it all works, it gets complicated.

The two new drivers are the Data Collection Service driver and Managed System Gateway drivers. Lucky for you, I spent many an hour on the bus ride in to work walking through those drivers, so you can see what is happening under the covers. (After you recover from the horrors, you might still be reading my work). You can find those articles at these links:

Data Collection Service:

Managed System Gateway:

However, to summarize, one of the drivers is meant to have a filter that is open to all the various classes and attributes of interest, and each event is forwarded to the warehouse for events.

The other driver is a more of a push sort of thing (which is how the documentation describes it) that uses an interesting features that IDM drivers support, of injecting an event into another drivers queue. That is, this driver will submit a query into say an Active Directory or SAP UM driver, to get back some information. This way when the Reporting module needs more information, it can leverage the fact that there are existing drivers to make queries into those system, using the already existing connections.

If you imagine how the Analyzer product (also newly available (It was available before but almost impossible to get) in IDM 4 as well) handles this issue, you can see why this is probably a better approach for Reporting. Analyzer pretty much expects you to run a second Remote Loader instance for it to use, to generate its queries against the various connected systems. In that case, large queries (all users, all groups, etc) are probably common so it may make more sense, whereas here the queries are probably going to be more direct and smaller.

Thus the Reporting module can get events as they happen, and query for more information when it requires it. On top of that it will catch events of the sort that the Sentinel infrastructure can handle (Though likely more limited for licensing reasons to IDM related stuff) and store it all together.

So having said all that, how do you get Reporting going in your existing IDM 3.61 solution. Obviously you need to upgrade to IDM 4 Advanced Edition (though by the time this is published Standard Edition (SE) should be available, but it will have a scaled down Reporting and other bits) to get Reporting installed.

The good news is that aside from support for things needed in Packaging, the engine really did not change much at between IDM 3.6.1 and IDM 4. The only actual new feature I have found, is a single new automatic Global Configuration Variable (GCV) is available. Now that is a BIG aside. As support for Packaging changes all sorts of interesting things at the lowest level of the engine, and adding a ton of new stuff in Designer to manage it. Thus you can understand why basic engine functions (like new tokens, actions, or the like) were not really touched aside from bug fixes (A fun example is that a variable, holding a funny character, meant to be used for a regular expression expands funny and does not work in IDM 3.61 but now does work in IDM4, and stuff like that). .

You can read more about Packages, since they really are a big big deal in IDM 4, and are a lovely idea who should have been here years ago, in these articles:

Thus existing drivers you have up and running, tested, and are happy with and see no need to change should basically continue to just plain work. I have yet to see anything that breaks in IDM 4, nor do I see any changes that would lead me to think that. (Yet! I shall of course report anything I should find).

Now, if you were to start a new system from scratch and use the Package versions of the drivers, you would get a choice to use the Packages that add support for Reporting. But what do you do with your existing drivers that are happily working on your new IDM 4 system and you have no desire to run through the build and test process again to start them off with Packages?

Turns out, if I read the tea leaves correctly, this should be relatively straight forward. There was a previous analogous case of this in IDM with the move in RBPM 3.7 to the use of Resources as an abstraction on top of Entitlements. At that time, two Novell folk wrote a great Cool Solution, explaining how to convert your existing driver in this article: Convert Driver Entitlements to New RBPM 3.7 Resource Model

They explained WHAT you need to do, and then I wrote an article explaining WHY this is needed, and what it is doing in the background.
Converting Entitlements to Resources, more details

In this case, I believe there are at least four things needed to support Reporting in an existing driver.

Three Policy objects:

One Global Configuration Variable object:

You can follow along by opening a Project in Designer 4, making an Identity Vault that is set to be IDM 4 level, and then import an Active Directory driver using Packages. Then you will find all the policies and objects I am going to discuss.

The first policy object, NOVLDATACOLL-itp-DataCollectionQuerySupport, as the name suggests is part of the NOVL provided DATACOLL (Data Collection) package, and it resides in the Input Transform, as a DirXML Script policy object (itp).

In this case, the only rule is nicely named as: Rename @association-ref to @association and change @type from "dn" to "{dn}"

Basically, if a query has come through on the Subscriber channel (injected by the Data Collection Service driver into this drivers queue) then it will have an operation property named data-collection-query which is equal to true.

In that case, it is likely that the returned data may have attributes of type='dn'. If you have seen these in other drivers you will know that the driver will try to add an association-ref XML attribute to the document with their association value, so that the engine can then resolve that to an eDirectory DN, since DN syntax attributes need to be real objects. If the association-ref does not resolve to something in eDirectory, then the driver drops the value from the document.

Normally this is good behavior and what you would want. However, the DCS driver is meant to allow the Reporting engine to send a query for say all Groups into the driver and get the answer back. But what if you are not synchronizing Group objects, then in the response document all the Groups would pick up association-refs and not be resolved, and then get dropped. Or perhaps the Member list which is all DN syntax attributes, might have members who are not synchronized. In that case the values would get dropped, and the value of the query and its results would be diminished.

What this rule does is reformat the XML attribute and set the type XML attribute to type='{dn}' which the engine will not treat as a type='dn' and thus not throw away the data if the object is not actually associated. Also the association-ref XML attribute is actually removed, and its value is placed in an XML attribute called association.

In this case, the Comment on the rule is actually pretty good, as it gives some hints and assumes you know the backstory, which is way better than in the past when we got almost nothing. The comment says:

"The engine tries to resolve the association in @association-ref of values with @type="dn" to an object in the IDV and removes the value if the association cannot be resolved. For data collection to work, we need all values to be returned at all times."

Next up is a very clever Policy, called NOVLDATACOLL-smp-SkipSchemaMapping. Like the previous policy it is from the same Package (NOVLDATACOLL). But this one resides in the Schema Map, as a Policy object (smp). What is clever about this, is that it is actually linked in twice. Once before the Schema Map rule, and once after.

This makes more sense when you look at the policy. It has two rules, skip and restore. Very original naming. As in the previous example, it watches for instance documents (the response to a query) where the operation property data-collection-query is set to true. In those cases the DCS driver would prefer to get the response to its query in the applications native schema namespace. Well the schema map rule exists for exactly the opposite purpose, to convert the application schema to the correspondingly mapped eDirectory schema values.

To prevent this from happening, the first rule, skip, looks through each <attr> node for the <attr attr-name="Some Attribute"> and replaced the value with a version contained inside curly braces. So Some Attribute would become {Some Attribute} or another way:
<attr attr-name="Some Attribute">
<attr attr-name="{Some Attribute}">

This means that when the Schema Map rule is applied (following the first linkage of this rule) nothing should be converted from the application namespace, since the curly braces should change the names sufficiently to avoid the mapping.

It does this by looping through the XPATH of ./attr, which limits it to a an <instance> document as well, since an <add> event would be full of <add-attr attr-name='Something"> nodes, and a <modify> event would be full of <modify-attr attr-name='Something">, but an <instance> document would have only <attr attr-name='Something"> nodes.

This is one of those reasons why the reformat operation attribute token is actually so powerful, is that it works on all three cases, which is not always obvious. For more information on the reformat operation attribute token, you can read this article:
Reformat Operation Attribute

Then inside the loop, it sets the XML attribute (which is nice as you do not have to remove the old value, just set in a new one) for attr-name as the current-node variable, preceded by a { and followed by a } thus making it miss the Schema Map policy.

Then it clears the operation property of data-collection-query (so that this rule will not fire on it, as it returns on its way back), and sets the restore-attr-names operation property to true. Then it break's to avoid the next rule, which is called simply "restore".

The rule "restore" is sort of the opposite of the "skip" rule, and is meant to fire as the query returns. Now in practice this is meant to be query on the Subscriber channel, and reply on the Publisher channel, but this rule would actually work in both directions if the application knew to add the appropriate operation property to trigger it.

The test now is for the restore-attr-names operation property and if true, then it is the response to a DCS driver query, and now the need is to remove the curly braces ( {} ) from around the attribute names.

It does this with the same for each loop through the XPATH of ./attr which means any <attr> node in the current document (the period), and inside the loop uses some XPATH string functions. The XPATH is:


So that is, substring $current-node/@attr-name (the value of the attr-name XML attribute in the current node) and before the } and after the { to get back the original name. Then use that result to set the XML attribute attr-name.

Very clever approach, and simple once you understand it.

Next up we have a rule in the ITP called NOVLMSINFO-itp-InitManageSystemInfo that I will discuss in the next article in this series. Stayed tuned.


How To-Best Practice
Comment List