DevOps Cloud (ADM)
Application Modernization
CyberRes by OpenText
IT Operations Management
In Identity Manager 4.0 Novell has introduced a number of new features. There are four new driver configurations, two for applications (Salesforce.com and Sharepoint) and two for IDM itself to use, the Managed System Gateway driver and the Data Collection Service driver.
The Managed System Gateway driver is primarily used by the Reporting module to get information about users out of IDM and into the Reporting database. This is somewhat analogous to how the Identity Audit extension policies that were added to drivers are used to get Identity information into the Sentinel database.
As with many things in IDM 4, this is totally new stuff, and will take some time to get used too. You can read more about the changes between the various IDM versions in these articles:
One of the main new features is Packages, which is critical for all this working, as there are packages that add support for the reporting module to each of the many driver configurations. In fact the same approach is used for the Identity audit extensions as well. This is different than in the past, where the policies were stored in Libraries, centrally and linked into each driver as needed. Now with packages the content is actually duplicated in many places, but with Packages, upgrades are made easier than the previous model. I have been working on a series on Packages in Designer 4, that you can use to gain some insight:
The Managed System Gateway (MSG) driver is one interesting critter. It is doing all sorts of funky and interesting things that it is worth discussing the low level functionality. After all, if you do not know what it is supposed to be doing, how would you know what it is not doing, when it is not working. Most connected system drivers are pretty traditional, that is an event comes out of the application of eDirectory, as an XDS document (which is what the shims job is, convert the applications event into XDS and convert XDS into things the application understands) which is then processed in the flow.
There was the ID Provider driver from Novell that basically allows you to get ID's that are unique and next in sequence. There was the State Machine driver in the Compliance Management Platform for handling the lifecycle of users. I have not really looked at the Sentinel driver, but I imagine it is doing something somewhere in between. Now we have a couple more that are not doing traditional synchronization tasks.
Now on to more fun adventures diving hip deep into audacious policy!
In the second article we set some boolean variables based on whether the Matching policy is directly (if class name equals User) or indirectly (no class-name equals tokens at all, therefore some other approach is being used, probably the same match criteria for all object classes in the filter) matching users, and whether there is any attribute based matching going on.
Now if all those conditions are true (well Direct or Indirect are true, AND attribute matching is true), then since we are looping through potentially many policy objects and rules, if this is our first time through (using a variable set to true before we entered the loop) add an <attr> node with an attr-name="rule" XML attribute to the local variable we are storing the rule cache in. (This was initialized with some header info before we started looping, I will show the full structure below).
So far in the rule cache variable we have since we are actually looping through all drivers, and each driver with an MSysInfo Global Configuration Variable (GCV) object is in scope of this driver and therefore will get an <instance> node in the cache. This header comes from the MSysInfo GCV data.
<cache>
<instance class-name="MANAGED_SYSTEM_MATCHING" src-dn="cn=msysinfo,cn=acme domain,cn=driverset1,o=system">
<association>B097B0E7-A932-2540-63A7-B097B0E7A932</association>
</instance>
</cache>
This last segment of code just added the line:
<attr attr-name="rule"/>
giving us:
<cache>
<instance class-name="MANAGED_SYSTEM_MATCHING" src-dn="cn=msysinfo,cn=acme domain,cn=driverset1,o=system">
<association>B097B0E7-A932-2540-63A7-B097B0E7A932</association>
<attr attr-name="rule"/>
</instance>
</cache>
Now we only the add the <attr> node once, and the rest are going to be <value> nodes below it.
Next add a <value type="structured"> node, and then loop through all nodes in the nodeset:
$current-node/actions//do-find-matching-object/arg-match-attr/@name
which like in the previous article will look in the <actions> nodes for any (//) <do-find-matching-object> nodes, that have a <arg-match-attr> node, and return the XML attribute 'name' value as the new loops current-node.
Then we add each of these into the rule cache variable as <component name="CN"/> nodes. Along the way it does something very cute and uses the approach for doing a Schema Map in XPATH XPATH to do schema mapping rule
to make sure it is mapped, and if it is, to use the mapped attribute value, otherwise use the listed attribute value. Interestingly, this is using the MSG drivers schema map, I guess this is to allow you to globally map attributes you might want the reporting module to treat differently. This has consequences that will take some thinking about before I am sure of what it really means. I think initially this is meant to normalize the eDirectory schema to the Reporting databases schema, but I am thinking there might be some interesting use cases for this.
There is a little bit of cleanup, since there is a possibility that there was no such attribute matching found in the XML, but before it looped through it, they added the <value> node, so clean up with a strip by XPATH of:
$responseInstance/instance/attr[@attr-name='rule']/value[count(component)=0]"
This will find any <value> nodes (in the responseInstance variables, under the instance/attr nodes) with no component values and remove them. I.e. Empty <value> nodes. Nice touch cleaning up like that!
Then this is repeated for the other channel, since we started looking at the rules in Subscriber channel, and now the same basic rule is repeated for the Publisher channel. The primary need for repeating this segment of code is that the DirXML-Policies object uses a different value to identity the policies in the Matching rule on the Subscriber channel and Publisher channel. In the attribute syntax of Typed Name, the 'interval' is 6 for the Subscriber Matching policies, and 7 for the Publisher Matching policies.
I think this could have been combined, since they report out into the same nodeset variable without distinguishing between the channel the match came from, but for ease of reading, I guess this works as well. Finally, after all the looping and work is done the responseInstance variable is copied into the RULE_CACHE variable which for a simple Active Directory driver only, looks like:
<cache>
<instance class-name="MANAGED_SYSTEM_MATCHING" src-dn="cn=msysinfo,cn=acme domain,cn=driverset1,o=system">
<association>B097B0E7-A932-2540-63A7-B097B0E7A932</association>
<attr attr-name="rule">
<value type="structured">
<component name="CN"/>
</value>
<value type="structured">
<component name="CN"/>
</value>
<value type="structured">
<component name="FULL_NAME"/>
</value>
</attr>
</instance>
</cache>
If there were more than one driver in the system, with support for the Managed System Gateway module, then there would be more than one <instance> node, containing the driver info. and you can imagine how it might look, if there were more than one driver. It would seem that this Active Directory driver has three different matching rules, two by CN and one by Full Name, which is mapped to the MSG driver schema (which is really the Reporting database schema I would imagine) of FULL_NAME.
This makes sense as the normal Matching rule in a modern Active Directory driver uses some GCV's to control mirrored or flat placement, and to decide if we use the Full Name as the naming attribute in AD or the CN (usually mapped to sAMAccountName in the AD driver).
I have ignored an entire concept that is used in all the caching for this driver. The subscriber and publisher channel run as independent threads and in theory (and occasionally in practice) can interfere with each other. Thus they have this concept of a Cache Mutex. Basically before they start working on building any of these caching local variables, they make sure the mutex is free (try three times, with a one second pause after each) and if they get it, signal that they are holding it, to block the other thread from taking it.
This is less important for 'fast' operations, but there is a LOT of work being done in these various policies, and instead of taking milliseconds to process, we might during driver startup run into the second range to process them.
Additionally, since these need to be completed before any responses out of cache can occur, they need to be populated at driver startup, and driver startup is a particularly busy time with all sorts of things going on and taking time, and interleaved events. Even worse, as soon as the driver is 'started' events in the queue will start to try and process almost immediately, so there are some serious timing issues that can occur.
This driver takes an approach I have never seen before in a driver, but then again, it is doing lots of things I have never seen before all over the place!
You can see this as the first couple of rules in each of the policy sets in the Input Transform, since that is where the cache population rules policies reside.
The first thing they do is to ensure each of these cache building policies run only a single time and only after the driver has completed starting up they use two tests. First, if you query for the object class of the current driver (which you get with the driver.auto.driverDN GCV, that returns the running drivers DN when called) you do not get a response back until the driver is sufficiently initialized to do much of anything else. I have seen this before and used it before, very handy trick to know. Otherwise you see strange events when you rely on some preloaded data, and some event is in the queue on the Subscriber channel and starts to process before the Publisher channel is finished, and you do not yet have your data preloaded.
The other test is to see if there is a value for the local variable (Driver scoped, so it is available until the driver is shut down) has a value.
Next up is to set the local variable for using the variable, which for the rule cache is known as RULE__CACHE__MUTEX and if it is true, the other channel probably has it. They try three times, each time using the doWait() function from the AJC ECMA functions to pause for a second to let the other thread complete. If it completes it sets the value to true, and proceeds. If not, it tries again.
When the work is all done, it sets the variable to false to release its lock. Very neat approach.
What you also see in each rule that later reads out of the cache, (they are all located in the Publisher channel, Event Transform, in a series of Policy objects named Dispatch Some name query) it takes the same approach, of trying to see if something else has the mutex, and if it does, it waits for a full second, three times until it either succeeds or else it gives up. Once it has access to the mutex it grabs it which would lock out any changes to the cache until it is finished and releases it.
Personally at this level, I see why they did all the work here for at least consistency, and once it is written it is really easier to reuse. But if you think about it, the core reason to lock the mutex is to prevent the queries from responding from an empty cache before it is populated or while it is being populated. But once the cache is populated, it is staying there for the duration of the driver life time, until it is next restarted. So I do not really think it is needed here. However there is no real harm to it of course.
Stay tuned for part 4 where we look more at some of the queries and the uniquely interesting way they are handled!