Data Collection Service Driver Walkthrough - Part 4

With Novell Identity Manager 4.0 there are a number of new features available. You can read more about those features in these articles:

There are four new drivers, two are for new connected system supported ( and Sharepoint) and two are used as service drivers that are needed for the Reporting module.

These two drivers are the Managed System Gateway (MSG) driver, and Data Collection Service (DCS) driver, the first of which you can read about in this series of articles:

The Data Collection Service driver, is the second half of that pairing, and will be the topic of this series. Both these drivers are meant to enable the Reporting module to get enough information about the system to report upon it. The MSG driver is focused more on providing information about how the drivers are configured, heck it even tries to infer the matching rule criteria by reading the rules out of the objects, and the DCS driver is focused more on collecting events about objects for storage in the Reporting database.

In the first article Data Collection Service Driver Walkthrough - Part 1 I started looking at how it builds a cache variable and got through most of the work it does to get the correct IP Address of the server running the Managed System Gateway driver.

In the second article Data Collection Service Driver Walkthrough - Part 2 I finished working through how the cache is built.

In the third article Data Collection Service Driver Walkthrough - Part 3 I look at how queries out of the cache are handled, and what the filter for this driver sends to the shim.

In this article I will finish up one last rule in the Input Transform that handles some error cases and start discussing the Subscriber channel.

The last rule in the Input transform is called NOVLDCSERROR-itp-ErrorHandler that handles errors back from the shim. The comment field here is useful, which I love to see, and they did it my favorite way! They pasted in a sample of the error they are trying to handle in the rule. It says "Unable to process [modify] event from DCS driver; Object DN: <dn>, Structural class: <class>, Reason: Associated object not found in database with GUID: <guid>"

When a <status> event comes back from the shim, and it has an XML attribute return-code of 2, and an XML attribute level of error, and there is an object-guid XML attribute available, then the shim is returning a specific error case.

A bunch of variables get set, rc for the @return-code, guid for @object-guid, and then some cute machinations for event id.

The event-id looks like it will contain a #retry-1 value, which is useful, to know how many times the shim has tried this issue. But the driver uses a cute method to get the retry count out of the event id.

First it uses this XPATH to get a useful event ID and retry counter. Usually event ID values are separated by the number sign (or pound #) and thus you can break them apart based on that. I do not have a sample trace handy with an actual event id in it, but you can guess what it looks like. Something like "0#something#retry-2#someEvent"

The event id is selected with this XPATH::

substring-after(substring-after(@event-id, '#retry-'), '#')

So get everything after the #retry-id- part, which will get say 2#someEvent which is probably unique enough.

Next it tries to get the retry count by a slight variation with the XPATH:

substring-before(substring-after(@event-id, '#retry-'), '#')

So this time, get that 2#someEvent the same as above, but now substring before the # to get just the integer 2.

That was if the event-id XML attribute had the string retry-id in it. If it does not, just use the event-id whole, and set the retry count to 0.

Since we have the GUID of the object with the error, next the driver reads the object back, using the GUID as the association value. When you reference an object DirXML Script lets you specify a full DN, or an association value. This can be very handy as it can save the step of using something like the Resolve token to find the objects DN, to then query for that objects attributes.

Then if the retry count is not higher than the maximum retries global configuration variable (GCV) value then it tries again by removing the DirXML-Association value for this driver, Then adds it back with a state of 4, which means migrate. It uses the add and remove destination attribute token, with the Structured type. Usually you would use the string or dn syntax type, but structured is very useful when trying to manage an attribute with a syntax of Path. DirXML-Associations uses a really powerful syntax type, which has three components, the nameSpace component which is a 32 bit integer, and holds the association state info. The volume component is DN syntax and holds the driver DN for which this association belongs. Finally the path component holds the string which is the association value. You can see a list of different association value by driver in this article: Open Call - IDM Association Values for eDirectory Objects

To clear such an attribute is easy. To remove one value, you have to have provided all the components for an exact match. Which is a shame, as it would be nice to be able to say remove all such attributes from this user, where the DN matches, but whatever path value is found.

To add it, you also need to provide all the component values. Which this driver is a nice sample of how you might do that. Often when writing policy to do this, I loop through all the association values, and test each one to if the driver DN is equal to the current drivers, and if so, and remove the attribute selecting the correct data by using the XPATH of $current-node/value[component='nameSpace'] and so on for all three of the components.

Then it adds the event id attribute into the <modify> event with the counter incremented by one, using the set XML Attribute and the XPATH of ../modify. This will actually add it to all the modify events it is sending into eDirectory, which is fine, since we really only care about one coming back potentially as an error. Then after validating the there is a variable of the GUID of the driver handy and if not querying for it, the policy adds the association-ref XML attribute with the Set XML Attribute token, this time, using an XPATH of:
../modify[contains(@event-id, '#retry-')]/modify-attr[@attr-name='DirXML-Associations']//component[@name='volume']

This one says, in the parent of this context (..) find the <modify> node whose XML attribute event-id (@event-id) contains the string "#retry-", then look for a <modify-attr> node under that <modify> node, whose XML attribute attr-name (@attr-name) is DirXML-Associations, and then find the <component> node whose XML attribute name (@name) is "volume". It adds the driver GUID into the <component name='volume' association-ref='someGUIDvalue'>

By setting the DirXML-Assocation state value (the nameSpace component) to 4, it forces the driver to try and migrate the object again, which is a great way of forcing a retry. The engine will try and maintain the event id even as it changes the event from a modify to a migrate to potentially a synthetic add or modify event. This way should that fail again, the event ID with the retry count in it will survive to be read and then incremented so that it does get into an infinite loop, and will eventually stop trying.

Then just as a reporting rule, they trace out a message about the status of the event, for all returned events with a return-code. If it is other than 0 it is an error, but even in the 0 case it will trace out a message that its state was 0, for completeness. Nice touches again.

Next up to look at is the Subscriber channel, since basically nothing much else happens on the Publisher channel of this driver. The Subscriber Event Transform policy set has three rules.


Lets go through those one by one. First up is the NOVLDCSERROR-sub-etp-ErrorHandler policy object. This has a rule to veto loop back retry events. It is looking for events that are not sync or a status event, using the regular expression equality test, so that in one condition token. This lets you specify sync|status and the pipe symbol (|) means OR in regular expressions. However by itself this would be an inherently dangerous rule, as without any further conditions the driver would fail to start as several driver start up documents would be processed by this rule, vetoed and the driver would fail to start. That is a common rookie mistake in IDM. The fix is trivial. Make sure there is an additional condition so you do not veto too much. You can read more about this issue in the article: Avoiding Startup Vetos with Scoping Rules

To avoid this issue, they additionally test for an XPATH condition of:
contains(@event-id, '#retry-1')

That is, the XML attribute event-id needs to contain the string #retry-1 in it. I guess this assumes you will never need more than one try to succeed, since the previous rule starts at 0 and increments the retry-X counter in the event-id. I wonder if they really should be testing for the case of retry- but not retry-0. In other words any value other than retry-0. Though I guess with this it would never get to retry-2 anyway, so maybe it is all academic.

Once an event hits this condition, they trace out a message, send a status event with level of success, and a message payload that the retry loop back event was vetoed, and finally they veto the event.

Next up is the NOVLDCSERROR-sub-etp-DirtyRegistrationCache rule. First rule makes sure the msgw-driver-slash-dn local variable is available. If not it sets it as a driver scoped variable from the GCV that stores the value.

The next rule handles the case where a driver configuration changes, since that could mean the Managed System Gateway drivers configuration has changed, in which case it would be important to rebuild the cache built in the Input Transform.

Since there is no great way in the event model to say, send me events for this object or set of objects only. the driver has to event on all driver objects changing and then filter the events out here to make sure that it only dirties the cache if the current driver (using the GCV) or the MSG driver, using the value of the msgw-driver-slash-dn local variable to compare.

If the Data Collection Service or MSG driver have changed, then it sets the regnCacheInitialized variable to false, and the next event through the Input transform will realize and rebuild the registration cache, keeping the data complete.

Now this is an interesting approach that the MSG driver uses as well, and I have a minor concern, since with the ability to make changes and not restart a driver, this seems like it is making a dangerous assumption, that any changes made go into affect right away. Often the case is that changes are pushed out to the tree, but the driver is not restarted until some maintenance window. Now I do not know how often this happens, but I could see this changing data a bit too soon in some cases. Alas, I cannot think of a simple way to event on one of these drivers restarting, though I suppose now with Packages, you could just add some policy so that every time a driver starts up it generates an event on itself, that this driver could watch for. But then you have two disconnected events that are somewhat tricky to try and correlate the timing of the change applying. Tricky issue, and I guess the exposure is a bit low, and maybe it should just be noted, and care taken to avoid getting into this fairly rare error case.

I think I will have to try and pass this one back to Novell to see what they have to say about it. I guess it would be better if I had a solution in hand, instead of just complaining about it. I think eventing on a driver restart in just the case of the MSG and DCS drivers would be scalable but in the MSG driver it events on all driver changes, so that would have a deeper issue to work through.

Next article I will try and finish the Event transformation rules.


How To-Best Practice
Comment List