Advanced usage of the Generic File driver

0 Likes
By: scauwe

Introduction


The generic file driver has been out for some time now, and I received lots of positive feedback and usage scenarios, including a Cool Solution article. Based on this feedback, I want to present some advanced usage of the driver.

Driver goal


What I want to present here is the option to import (aka "initial load") different object classes in one Generic file driver instance, with the feature of having relations between them established by the IDM engine. For this to work, we use the following features of the driver:

  1. Use CSV file headers on the publisher channel.

  • Get the file name on all publisher commands.

  • Ecmascript based association calculation.


In this example, we are importing users, groups and group memberships in one single driver. This is done by putting 3 different files on the server: user.csv, group.csv and usergroup.csv.

The end result should be that the user csv file creates users (and thus generates commands for class user), that the group csv does the same for groups, and that the user-group csv adds group memberships on the user object (and thus also generate commands for the user class).

Note: the example is for a so called "initial load". Reloading the user.csv file will reset (clear) all group memberships. If you want to alter this, you need to make some modifications.

Use CSV file headers on the publisher channel


Users, groups and user-group files do not have the same schema. Because of this, we cannot use the shema defined on the shim, but need to use the CSV header files. Each of these files contains a header according to the following table:



File

Header



user.csv

id,givenName,name,mail



group.csv

id,name,description



usergroup.csv

userid,groupid




The publisher shim must be configured to use the headers in the file (see all publisher options, option csvReader_UseHeaderNames). This CSV file header is then consumed by the shim and used as attribute names on the publisher channel. Subsequent, the command for a user received on the publisher will look like:
<add class-name="ToDo">
<add-attr attr-name="id">
<value type="state">USR1234</value>
</add-attr>
<add-attr attr-name="name">
<value type="state">aName</value>
</add-attr>
<add-attr attr-name="givenName">
<value type="state">aGiveName</value>
</add-attr>
<add-attr attr-name="mail">
<value type="state">aMail@comp.com</value>
</add-attr>
</add>

While the command for a group received on the publisher will look like this:
<add class-name="ToDo">
<add-attr attr-name="id">
<value type="state">GRP1234</value>
</add-attr>
<add-attr attr-name="name">
<value type="state">grpName</value>
</add-attr>
<add-attr attr-name="description">
<value type="state">grpDescription</value>
</add-attr>
</add>

The attributes in the command received are thus not defined by the schema in the driver config, but by the header in the CSV file: the schema defined in the file (the CSV headers) take priority over the schema defined in the shim.

One downside of this is that the case (upper or lower case) of the attribute name is defined not by the shim, but by the party generating the CSV (and thus out of our control). In order to prevent errors due to differences in case, it is best practice to lowercase (or uppercase) all names of the attributes received and work with lower (or uppercase) attributes in all policies.

Get the file name on all publisher commands


In the above publisher commands, the class name for both user and group commands are 'ToDo'. These should become 'user' or 'group'.

A poor man solution is to check the attributes received and try to determine the class that this command should be (eg: only group objects have an attribute 'description').

The generic file driver has another, more elegant option: the shim supports adding meta data in the command (as regular attributes), including the name of the file being processed. This shim feature is used to set the correct class name in a policy: when we receive the file name in the event (on top of all the attributes we already have from the CSV file content), we can use a policy in the input transformation to change the class based on the name of the file. In order to achieve this, we need to do the following:

  • Configure the publisher channel to add the following meta data: fileName

  • In the input transformation, for every add event, use this new attribute and 'transfer' this to the class-name attribute.


Before the input transformation policy, the event looks like this:
<add class-name="ToDo">
<add-attr attr-name="id">
<value type="state">USR1234</value>
</add-attr>
<add-attr attr-name="name">
<value type="state">aName</value>
</add-attr>
<add-attr attr-name="givenName">
<value type="state">aGiveName</value>
</add-attr>
<add-attr attr-name="mail">
<value type="state">aMail@comp.com</value>
</add-attr>
<add-attr attr-name="fileName">
<value type="state">User_2015_SEPT_09.csv</value>
</add-attr>
</add>

The input transformation takes, on any add event, the fileName and converts this to the class name using regular expression replacement:
<do-set-op-class-name>
<arg-string>
<token-lower-case>
<token-replace-first regex="(?i)([^_] )(_?.*\.csv)" replace-with="$1">
<token-op-attr name="fileName"/>
</token-replace-first>
</token-lower-case>
</arg-string>
</do-set-op-class-name>

After the input transformation, the class name is set to 'user'. Note: the sample policy supports file names in the format of <cLaSsNaMe>⁠[⁠_⁠<⁠someIdentifierEgDateTime>⁠]⁠.⁠csv.
<add class-name="user">
<add-attr attr-name="id">
<value type="state">USR1234</value>
</add-attr>
<add-attr attr-name="name">
<value type="state">aName</value>
</add-attr>
<add-attr attr-name="givenName">
<value type="state">aGiveName</value>
</add-attr>
<add-attr attr-name="mail">
<value type="state">aMail@comp.com</value>
</add-attr>
<add-attr attr-name="fileName">
<value type="state">User_2015_SEPT_09.csv</value>
</add-attr>
</add>

This solves the classname for user.csv and group.csv. For usergroup.csv, we need to add an additional test and set it to user.

Ecmascript based association calculation


The previous example commands are still missing associations. The Generic file driver shim uses ECMA script to calculate the associations. In the example, the association for both user and group is the id attribute. For user-group however, we do not have this. We only have a userid and groupid. Since we are going to transform the user-group to a user modify (add group membership), we set the association to the userid.
The resulting ECMA script looks like this:
if (typeof id !== 'undefined'){id} else {userid}

Note that this is a one-liner: iManager has issues with multi-line driver configuration parameters.

This will set the basic association and works if we assume that the id of a user and the id of a group cannot be equal. If this assumption is not valid (which most probably is not), the association should be modified to eg include the class name. The association then becomes <class>=<id>. This is simple to do in IDM policies:
<do-set-op-association>
<arg-association>
<token-class-name/>
<token-text xml:space="preserve">=</token-text>
<token-association/>
</arg-association>
</do-set-op-association>

Miscellaneous policies



  1. The only thing missing is to persuade the IDM engine to resolve the associations (from the user-group.csv) for us. To do this, mark the groupid attribute as being a dn and specify the association value.
    <do-reformat-op-attr name="groupid">
    <arg-value type="dn">
    <token-text xml:space="preserve">group=$current-value$</token-text>
    </arg-value>
    </do-reformat-op-attr>

    The above will set the type of the groupid attribute to "dn" and change the value to make sure that it reflects the change we did in association value.
    Next we set the association-ref to be equal to the value.
    <do-set-xml-attr expression='$current-op//add-attr[@attr-name="groupid"]/value' name="association-ref">
        <arg-string>
            <token-op-attr name="groupid"/>
        </arg-string>
    </do-set-xml-attr>


  • Since we are using the schema as defined in the CSV file, we are dependent on the third party generating the file. In order to minimize potential issues, it is advised to lowercase all field names.

  • Group memberships in the UserGroup file should result in modifies, not adds. For this, we catch user add events with groupmembership in the event transformation and convert them to modify events.


Conclusion


With minimal changes, you can upload complex CSV files with relations between them in one Generic File driver. This principle can be extended to more classes and relations as desired.

Attached zip contains the driver package and the 3 sample CSV files. All configuration parameters in the package should be correct, but make sure to set the publisher source and temporary work folder.

 

Labels:

How To-Best Practice
Collateral
Comment List
Anonymous
Parents Comment Children
No Data
Related Discussions
Recommended