Data Collection Service Driver Walkthrough - Part 6

over 10 years ago
With Novell Identity Manager 4.0 there are a number of new features available. You can read more about those features in these articles:

There are four new drivers, two are for new connected system supported ( and Sharepoint) and two are used as service drivers that are needed for the Reporting module.

These two drivers are the Managed System Gateway (MSG) driver, and Data Collection Service (DCS) driver, the first of which you can read about in this series of articles:

The Data Collection Service driver, is the second half of that pairing, and will be the topic of this series. Both these drivers are meant to enable the Reporting module to get enough information about the system to report upon it. The MSG driver is focused more on providing information about how the drivers are configured, heck it even tries to infer the matching rule criteria by reading the rules out of the objects, and the DCS driver is focused more on collecting events about objects for storage in the Reporting database.

In the first article Data Collection Service Driver Walkthrough - Part 1 I started looking at how it builds a cache variable and got through most of the work it does to get the correct IP Address of the server running the Managed System Gateway driver.

In the second article Data Collection Service Driver Walkthrough - Part 2 I finished working through how the cache is built.

In the third article Data Collection Service Driver Walkthrough - Part 3 I look at how queries out of the cache are handled, and what the filter for this driver sends to the shim.

In the fourth article Data Collection Service Driver Walkthrough - Part 4 I finished up one last rule in the Input Transform that handles some error cases and started discussing the Subscriber channel.

In the fifth article Data Collection Service Driver Walkthrough - Part 5 I worked through more of the Subscriber Event Transformation policies, finishing the Event Transformation and getting through the Creation Policy set.

In this article I will work through the Subscriber Command Transform Policy set.

There is only one policy object in this policy set, the NOVLIDMDCSB-sub-ctp-EnrichEvent object.

This takes some of the data that is available in the document and reformats it in a way that the driver shim will better understand. For example in the Subscriber Event Transformation, there was a similarly named object Enrich Event, that noted that when a modify for an unassociated object comes through, it will be converted into a synthetic add, after all, a modify of one attribute on an object that does not yet exist in the connected system really means we need to create the object through an add event.

Well as the modify event gets converted to an add event, it turns out at least one XML attribute is lost, cached-time. This is useful to know in a reporting context as a driver might be turned off for an hour, a day, or a week, and the events that happen while it is turned off queue up and will process as the driver comes back up. Thus it is important to know when an event actually happened, as opposed to when the driver reported it. Thus the Event transformation adds the data as an operation-data property which will get carried over through a synthetic add event. In fact, if you read the articles on the Managed System Gateway (MSG) driver series or others in this series you will see that these driver configurations often use operation-data to carry the payload data from a query out of cache. Which is a pretty neat approach to the problem.

Thus at this point in the Command Transform, if the driver sees an event come through without the XML attribute cached-time, then it will try to add it.

This case is detected with an XPATH not true of @cached-time, but could equally have been done with a test of if XML Attribute not available cached-time. I have to admit I did not even notice the if XML attribute condition test until someone pointed it out to me a week or three ago. I always used the XPATH, but I guess using the token is a bit more readable of an approach. I would be curious to know if there is a performance difference, though I would imagine it would be quite slight if at all. On the other hand, even if it were a bit slower, I would probably support using the more readable approach, since the driver has to be supported and supportable over its lifetime, and minor tweaks to make it easier to read are usually worth it in my eyes. This is why I am often such a big fan of using the Comments field to explain what and why you have the driver doing what it does. Recently someone pointed out that you can generate inline comments if you like, just by using the Trace token, and then setting it to be disabled. That is, you get a token that lets you enter free form text, and once disabled, adds no processing time. That sure sounds like a comment token to me!

If there is no cached-time XML attribute, then the policy will add the value from the operation property. This comes in the LDAP'y date format of 201012130101.000Z for the date string. Then they clear the operation property, which is interesting. I have never bothered clearing an operation property before, nor even considered why you might want to, and I wonder why they even bother. If anyone has an inkling I would love to know. Feel free to comment or message me directly if you do know.

If there is still no XML attribute, and there is no operation property with the timestamp for when the event was cached, then it will add the current time, since some time value is better than nothing and definitely the event happened at least this early.

For add operations, the driver does something very sneaky. It sets the XML attribute create-time to the XPATH of:
add-attr[@attr-name='Object Class']/value[1]/@timestamp

That is, look into the current document (the <add> node is the current context) for the <add-attr> node whose attr-name is Object Class (<add-attr attr-name='Object Class'>) and then since Object Class in eDirectory is almost always multivalued these days, take the first value (the predicate of [1] is short hand for [position()=1]) node and then gets its XML attribute timestamp. This relies on the fact that the driver will return the timestamp for each attributes create time, and you need to provide an Object Class at create time, or else the object won't create. You might be able to get away with looking at GUID for the same information, but this approach is very neat and elegant!

Thus the timestamp of the Object Class attribute is very likely the create time for the object. They choose the first one, since later ones might be added as auxiliary classes but the base class or two come at create time. Everything inherits from Top, and then some other object class, this is a pretty safe assumption to make.

Next the dest-dn is added to the document in LDAP format. The ParseDN token is great and very powerful, and you can read more about it here: Examples of using the ParseDN Token in Identity Manager

But it cannot generate data out of nothing. Using backslash notation as Identity Manager does for its internal use, does not carry with it enough information about the type of container objects to give them LDAP naming. That is, the DN of com\acme\users\geoffc might be in LDAP notation cn=geoffc, ou=users, o=acme, dc=com or perhaps it might be as crazy as uid=geoffc, cn=users, dc=acme, dc=com both of which are equally valid.

It is also worth noting that you might end up with a T= node in there. That is when you get a DN in the format \TREE-NAME\O\ou\user if there is a leading backslash, then the tree name is being specified. If there is no tree name specified, then you do not start it with a leading backslash, especially if you are building the DN as a string yourself.

Instead, what you need to do is use the XML attribute qualified-src-dn which is in a strange qualified slash format, and use ParseDN to convert to LDAP format. By default the engine will send the qualified-src-dn value on the Subscriber channel and if you wanted to, there is an engine control value (ECV) that will disable this. I believe it is on by default now on all drivers.

Interestingly enough, the ordering from root most to leaf most is reversed from LDAPs ordering in the qualified slash format. So that while in LDAP it might be:
cn=geoffc, ou=users, o=acme, dc=com

in Qualified slash it would be: dc=com/o=acme/ou=users/cn=geoffc which takes a bit of getting used too.

Now they could have added the destination DN by using the Set XML attribute token with the value of dest-dn, but instead used the Set Operational Destination DN, which is usually used in the Placement rule, but does the same thing. I suppose that is just as readable either way.

That's it for the Command Transform, now for a quick look at the Filter. It looks like a ton of attributes are set to synced to the shim, and I guess this is how the Reporting engine gets event data into the Data Warehouse. That is, the events happen in the Identity Vault, this driver events upon them like any other driver and forwards them on to the shim which hands them off to the database.

I think I see the answer to a question I had recently about how much data about Role and Resource granting will make it into the Reporting module, There is an entry in the filter for nrfRole and nrfResource which will inform the database about events relating to Role and Resource creation and modifications. The User object has filters added for nrfAssignedResources, nrfAssignedRoles, nrfContainerRoles, nrfGroupRoles, nrfInheritedRoles, DirXML-EntitlementRef, and nrfResourceHistory, which is probably all the various attributes you would need to see Roles and Resources being granted to a User. I see also that the Organizational Unit object has the nrfAssociatedRoles attribute in the filter, as does the Group object. I do NOT see the nrfAssociatedRoles attribute in the filter for dc, nor Organization objects which makes me wonder if perhaps RBPM does not allow for the assignment of such an attribute to a domain or Organization object? That is worth checking into. I see in schema that it comes from the auxiliary class nrfContainer and I wonder if RBPM just does not support assigning it to domains and Organizations or if this is just an oversight.

The filter and the Schema map kind of go hand in hand, so now a quick look at the Schema Map shows that basically all the attributes you might see, are being mapped into LDAP namespace style attribute names. That is, first character is lower cased, then any spaces in the attribute named, and camel cased if more than one word was in the attribute name. Having just looked at the filter, it is clear that not everything is being remapped, usually just attributes with spaces in the name. The newer attributes from Novell are being created in eDirectory in a more LDAP complaint fashion.

Of interest is for the container like objects (domain, Organization, Organizational Unit) the naming attribute (dc, o, and ou respectively) are remapped to _dcsName and the Description attribute is remapped to _dcsDescription which is worth noting. I am not sure why this is done, but is probably related to how the data is stored in the underlying reporting database (The Data Warehouse).

Onwards into the Output Transformation policies. Here we have four Policy objects. The first is NOVLIDMDCSB-otp-RegistrationQueryResponse which is similar to how the Managed System Gateway driver works for handling queries out of the cache.

Basically as we discussed in the first article in the series, when the driver shim needs to get some data from the cache that was built by the Input transform, it copies it into the <operation-data> node via a Clone by XPATH statement. This gets sent into the engine, which returns nothing, since the query is not really valid. The returned document gets the operation data added back to it, after it comes out of the engine, and then here in the Output Transform, if the case is the right one, it copies the result out of the <operation-data> node via a Clone by XPATH of ./operation-data/instance[@class-name=../@query-id] which is kind of interesting, as it looks at the operation-data node, which is a child of the current node (which is always the current event like <add>, <modify>, <instance>, or the like) looking for the <instance> node inside there, who has an XML attribute class-name, that is the same as the original current nodes XML attribute query-id. (Thus the ../@query-id). This saves using a variable to get the @query-id into a variable to do the compare. Classy. It copies this into the node above it (..) and then strips out the current node.

The thing I wonder about this rule, is that the conditions under which it fire are two fold. That operation is status, and that the operation property query-id is available. The Input transform takes a <query> doc with a class-name XML attribute of __DCS_REGISTRATION__ which it then sets the operation property of query-id to that same value.

However, I would think that the response to <query> doc from the Publisher channel would be an <instance> document, not a <status> document, so short of seeing this in trace, I am not entirely sure what is going on here.

That's it for now. Next up we have to finish working through the Output Transformation, where there are a couple (three) more rules, related to reformatting some data (Member Queries, Format Conversions in Policy, and Format Conversions via the use of an XSLT style sheet). I was browsing through the Designer interface looking for more interesting things to talk about and I had noticed some interesting text in the Readme for this driver that I thought would be worthy of discussion.


How To-Best Practice
Comment List
Related Discussions