Trying to understand the Managed System Gateway driver in IDM 4 - Part 1

0 Likes
In Identity Manager 4.0 Novell has introduced a number of new features. There are four new driver configurations, two for applications (Salesforce.com and Sharepoint) and two for IDM itself to use, the Managed System Gateway driver and the Data Collection Service driver.

The Managed System Gateway driver is primarily used by the Reporting module to get information about users out of IDM and into the Reporting database. This is somewhat analogous to how the Identity Audit extension policies that were added to drivers are used to get Identity information into the Sentinel database.

As with many things in IDM 4, this is totally new stuff, and will take some time to get used too. You can read more about the changes between the various IDM versions in these articles:






One of the main new features is Packages, which is critical for all this working, as there are packages that add support for the reporting module to each of the many driver configurations. In fact the same approach is used for the Identity audit extensions as well. This is different than in the past, where the policies were stored in Libraries, centrally and linked into each driver as needed. Now with packages the content is actually duplicated in many places, but with Packages, upgrades are made easier than the previous model. I have been working on a series on Packages in Designer 4, that you can use to gain some insight:










The Managed System Gateway (MSG) driver is one interesting critter. It is doing all sorts of funky and interesting things that it is worth discussing the low level functionality. After all, if you do not know what it is supposed to be doing, how would you know what it is not doing, when it is not working. Most connected system drivers are pretty traditional, that is an event comes out of the application of eDirectory, as an XDS document (which is what the shims job is, convert the applications event into XDS and convert XDS into things the application understands) which is then processed in the flow.

However there has been a trend toward using drivers in other ways, for example I discuss a concept I call Toolkit drivers, where you can build utility rules that report on data in the tree, or perhaps fix some bad data, or some such. You can read more about that in these articles:








In that model, some event, perhaps a Job triggers the rule, and then it does something pretty much otherwise unrelated to the triggering event. I.e. It is not processing an <add> or <modify> event and doing something related. It is processing a <trigger> event and generating a report that it sends via email.

There was the ID Provider driver from Novell that basically allows you to get ID's that are unique and next in sequence. There was the State Machine driver in the Compliance Management Platform for handling the lifecycle of users. I have not really looked at the Sentinel driver, but I imagine it is doing something somewhere in between. Now we have a couple more that are not doing traditional synchronization tasks.

I still do not fully understand what this driver is doing and why, (does anyone? If so, I have questions for you!) but having read through a trace and many of the policies I can infer a fair bit. Since there is no documentation for this level of explanation (and it is not clear if Novell ought to even deliver such docs) I figured I would tackle it as best I can. You can see more of this approach at a Wiki page I maintain, where I am trying to encourage others to do similar tasks for drivers, so that we can get the community to solve this documentation problem. Please feel free to take on a Policy object, and try it. No need to try to bite off an entire driver. Even just working on something small like the Identity Audit policies would be helpful.
http://wiki.novell.com/index.php/Detailed_driver_walk_through_collection

Here is what I learned from walking through the trace of a driver starting up. (Thanks John D for the trace sample! This is all your fault!)

The driver startup is pretty normal. The Global Configuration Variables (GCV's) are loaded. The Named Passwords are loaded, the policy objects, linked resource objects are all loaded. An IDM 4 driver startup looks much the same. Probably the biggest difference is that many objects and references start with the package SHORTNAME in all caps. I have to say that really catches my eye and makes it hard to read. It feels like the driver is shouting at me. Good driver, good driver, have a cookie, a query-ex cookie even!

I do not know if this is new to IDM 4 of just this driver since I have never seen this before in a driver start:
[11/17/10 16:28:42.476]:Managed System Gateway Driver ST:Restricting file Permission for /var/opt/novell/eDirectory/data/dib/dx33288.lg
[11/17/10 16:28:42.476]:Managed System Gateway Driver ST:Restricting file Permission for /var/opt/novell/eDirectory/data/dib/dx33288.db

This is typically a state file for a driver, and for some, like the JDBC driver in triggerless mode, it can grow large. (There was a bug fixed in the IDM 3.6.1 release that stopped it from growing out of control, which was good news!)

Other than that everything loads up. Then on the first event through, it starts a number of cache filling events. They added a neat test I first saw in the SAP HR CMP drivers. You can read more about the fun stuff I learned there in this series of articles:














Before initializing any of these cache variables, that will be driver scoped, and thus persist for the run time of the driver, they test if the driver is finished starting up. Now you could choose to event on something you know happens at the end of the driver startup process, but this test is just as effective. Use the GCV for dirxml.auto.driverdn to find the DN of the current driver, then query eDirectory for the Object Class attribute. If the driver is not yet up and running you will not get a response. Once the driver is running you get a response, and you can then know it has finished loading.

It looks like they define some custom queries, that when submitted do things other than a standard XDS query. That is, instead of actually querying for data, they return data from the driver scoped variable (aka the cache). Note this cache is unique to this driver and there are a series of rules that keep it up to date. The reason for not returning the queried data from a live lookup is that it is basically static, and it is a bit of processing to get into a useful format. Thus why not just load it up once, and keep it there until the driver is restarted.

So far in trace I have seen five of these custom API queries. By specifying the object class in the query as the API name, it makes it easy to manage. After all, the object class when thinking about API calls is basically the right attribute to use.

SERVER_INTERFACE

MANAGED_SYSTEM_INFORMATION

DRIVER_DN

MANAGED_SYSTEM_MATCHING

MS_ACCOUNT_INFO



When there is data coming in from a query whose API name is set to one of these values, it does not let the query through directly on the publisher channel, rather the driver reads the data out of the cache, appends it into an <operation-data> node as valid XML. Then when eDirectory returns nothing back on the Subscriber channel, to return to the shim, the operation data is reattached as payload to the query result. Finally a rule in the Output Transform copies the operation data into the document and then strips out the empty response. This way the query is returned from cache. It will be easier to see when I show some of the trace down below.

In order to return data from the cache it has to populate it as the driver starts up.

First up, the shim reads back the Network Address information and converts it into a node set variable, which is conveniently traced by the policy, so I can show a typical example right here. You can see that the host eDirectory server has NCP, LDAP, and LDAPS values in the Network Address (which are of course base 64 encoded).

Note they store the <instance> document under a <cache> node. This makes XPATH on it easier for a variety of reasons.

<cache>
<instance class-name="SERVER_INTERFACE" src-dn="CN=idv,OU=servers,O=system">
<association>BD9194F1-001A-2549-5C8A-BD9194F1001A</association>
<attr attr-name="interface">
<value type="structured">
<component name="protocol">NCP</component>
<component name="address">172.17.5.111</component>
<component name="port">524</component>
</value>
<value type="structured">
<component name="protocol">LDAP</component>
<component name="address">172.17.5.111</component>
<component name="port">389</component>
</value>
<value type="structured">
<component name="protocol">LDAPS</component>
<component name="address">172.17.5.111</component>
<component name="port">636</component>
</value>
</attr>
</instance>
</cache>


This is used to make later connections to a specific server. This might be used to pass the Reporting module the address to connect to over LDAP for a server. Nicely self configuring aspects of itself.

Next it builds the Driver Cache value. Here is gets back all the DirXML-Driver objects under the driver set along with their association values, which are stored as pretty GUIDs. There is an ECMA function in the AJC (Advanced Java Class) library that takes the binary GUID value and converts it to this format. There are actually functions to convert it to the GUID format used by the Active Directory driver, the eDirectory driver, and finally the way Entitlements store it. I am pretty sure it is using the later, since later it is going to query for entitlements using this GUID value..

The data is stored in a local variable with this XML:

<cache>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=acme domain">
<association>B097B0E7-A932-2540-63A7-B097B0E7A932</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=control system access with groups">
<association>EFD9C700-463B-7049-A580-EFD9C700463B</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=data collection service driver">
<association>3A34A94D-0EDD-6449-599A-3A34A94D0EDD</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=delimited text driver">
<association>2D86AB20-F8C1-8742-0F87-2D86AB20F8C1</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=hardwareresource">
<association>055A0D5C-13D3-a04c-8DB9-055A0D5C13D3</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=hr application">
<association>91234A9A-3264-4a4e-CBB2-91234A9A3264</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=managed system gateway driver">
<association>76F400F1-3846-c34a-FDA5-76F400F13846</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=role and resource service driver">
<association>9E91C009-EFCE-3544-5786-9E91C009EFCE</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=smart card system creation">
<association>4BC30EF6-641A-9b4f-0996-4BC30EF6641A</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=uarpt server">
<association>E61EBA21-7C3E-244b-A4A5-E61EBA217C3E</association>
</instance>
<instance class-name="DRIVER_DN" src-dn="o=system\cn=driverset1\cn=user application">
<association>EE22B1A5-AB6A-8b4b-C686-EE22B1A5AB6A</association>
</instance>
</cache>



Next it looks at the various drivers, reading back the MSysInfo objects, under the driver container.

It does that with this query:

<nds dtdversion="3.5" ndsversion="8.x">
<source>
<product version="4.0.0">DirXML</product>
<contact>Novell, Inc.</contact>
</source>
<input>
<query class-name="DirXML-GlobalConfigDef" dest-dn="\IDM4-IDV-01\system\driverset1" scope="subtree">
<search-class class-name="DirXML-GlobalConfigDef"/>
<search-attr attr-name="CN">
<value>MSysInfo</value>
</search-attr>
<read-attr attr-name="DirXML-ConfigValues"/>
</query>
</input>
</nds>



That looks for any DirXML-GlobalConfigDef object, in this MSG's driverset's subtree, whose name (CN) is MSysInfo. In the case of the example trace I got from John D, there is only one driver that has that such an object. Initially I thought it would loop through the cache nodeset of the Driver names, and then check for an MSysInfo object in each, however this approach is probably more efficient, since one query gets back zero or many as may be deployed in your system.

It looks like there is a Package called Managed System Information for each driver available that if you include it, will add this GCV object under the Driver container. The naming is consistent in each package so the MSG driver can easily reference it.

The MSG driver then reads out a couple of attributes (GCV values in this case) from it, and builds the MANAGED_SYSTEM_INFORMATION cache variable. This will be one <instance> node per system that is supporting the MSG driver.


<cache>
<instance class-name="MANAGED_SYSTEM_INFORMATION" logical-instance="false" src-dn="cn=acme domain,cn=driverset1,o=system">
<attr attr-name="msysinfo.dyn.ms.driverGuid">
<value type="string">B097B0E7-A932-2540-63A7-B097B0E7A932</value>
</attr>
<attr attr-name="msysinfo.dyn.ms.hierarchical">
<value type="state">true</value>
</attr>
<attr attr-name="msysinfo.drv.ms.name">
<value type="string">ACME Domain Server</value>
</attr>
<attr attr-name="msysinfo.drv.ms.description">
<value type="string">ACME Domain Server</value>
</attr>
<attr attr-name="msysinfo.drv.ms.location">
<value type="string">Chicago</value>
</attr>
<attr attr-name="msysinfo.drv.ms.vendor">
<value type="string">Microsoft</value>
</attr>
<attr attr-name="msysinfo.drv.ms.version">
<value type="string">2003</value>
</attr>
<attr attr-name="msysinfo.drv.ms.businessOwner">
<value type="dn">cn=cnano,ou=users,o=data</value>
</attr>
<attr attr-name="msysinfo.drv.ms.applicationOwner">
<value type="dn">cn=tmellon,ou=users,o=data</value>
</attr>
<attr attr-name="msysinfo.drv.ms.classification">
<value type="string">Vital</value>
</attr>
<attr attr-name="msysinfo.drv.ms.environment">
<value type="string">Production</value>
</attr>
<attr attr-name="msysinfo.drv.ms.auto.generated.info.show">
<value type="string">false</value>
</attr>
<attr attr-name="msysinfo.drv.ms.id">
<value type="string">B097B0E7-A932-2540-63A7-B097B0E7A932</value>
</attr>
<attr attr-name="msysinfo.drv.ms.type">
<value type="string">AD</value>
</attr>
<attr attr-name="msysinfo.drv.ms.auth.ip">
<value type="string">app.idm.com</value>
</attr>
<attr attr-name="msysinfo.drv.ms.auth.port">
<value type="string"/>
</attr>
<attr attr-name="msysinfo.drv.ms.auth.id">
<value type="string">Administrator</value>
</attr>
</instance>
</cache>


They do this with a bunch of queries and loops, but there is one little bit of XPATH that I never considered using, but in hindsight of course will work. It is worth discussing since it is interesting.

To build the bulk of the MANAGED_SYSTEM_INFORMATION nodeset they use the following XPATH to loop through the values.

$gcvdef//definition[@name!='msysinfo.drv.ms.logicalInstances' and starts-with(@name, 'msysinfo.drv.ms.')]

The variable gcvdef is holding the XML of the GCV values from the MSysInfo object. Then they loop through, looking for any (the // means find any such node, in a sort of recursive search) definition node, that matches the predicate of

[@name!='msysinfo.drv.ms.logicalInstances' and starts-with(@name, 'msysinfo.drv.ms.')]

That means only the nodes whose XML attribute 'name', is NOT equal to msysinfo.drv.ms.logicalInstances, AND the 'name' XML attribute begins with the string 'sysinfo.drv.ms'. You can see looking at the above cache for MANAGED_SYSTEM_INFORMATION that all the attr nodes start with 'sysinfo.drv.ms'. I never thought of trying to loop that way, but why not! Probably not the fastest approach, but will work fine on small node sets.

To make that XPATH easier to understand, here is a snipped down version of the XML that it is operating on. You can see that they needed to use //definition, (find any definition subnode) since definition could be under the <definitions> node, under a <group> node, and inside a <group> node, also under a <subordinates> node. Those are used to build a group, that you can hide or show in the interface. All the <definition> nodes under the <subordinates> node will collapse or expand together, based on the <definition> right under the <group> node. So the desired nodes, could be at many different levels in the document.

<configuration-values>
<definitions display-name="Managed System Information">
<header display-name="General Information"/>
<definition display-name="Name" name="msysinfo.drv.ms.name" type="string">
<description>Specify a descriptive name for the managed system.</description>
<value>ACME Domain Server</value>
</definition>
<definition display-name="Description" multi-line="true" name="msysinfo.drv.ms.description" type="string">
<description>Specify a brief description of the managed system</description>
<value>ACME Domain Server</value>
</definition>
<header display-name="Connection And Miscellaneous Information (auto-generated, do not change!)"/>
<group>
<definition display-name="Connection and miscellaneous information" name="msysinfo.drv.ms.auto.generated.info.show" type="enum">
<description>Show auto-generated information</description>
<enum-choice display-name="show">true</enum-choice>
<enum-choice display-name="hide">false</enum-choice>
<value>false</value>
</definition>
<subordinates active-value="true">
<definition display-name="ID" name="msysinfo.drv.ms.id" type="string">
<description>Uniquely identifies the managed system.</description>
<value>B097B0E7-A932-2540-63A7-B097B0E7A932</value>
</definition>
</subordinates>
</group>
</definitions>
</configuration-values>


As you can see this driver sure is doing some interesting things! The scary part is that next, the driver is going to read the Create and Matching rules to try and infer which object classes it cares about. Wait to you see how they are doing it! It looks pretty good, alas I can think of many ways that are reasonable to write a Match or Create rule that it would not correctly infer the data from. However you have to respect the attempt, it is quite an impressive approach. Stay tuned for much more on that topic in part 2 of this series. There is a ton of interesting trace to work through, so this might take a while!

Labels:

How To-Best Practice
Comment List
Related
Recommended