How to Create CSV Export Files Without the Delimited Text Driver

0 Likes
By Justin Birt

Introduction



Novell's Identity Manager provides a wide selection of drivers that allow you to connect with pretty much any system. Some of these drivers are included with the base product, such as the eDirectory, Active Directory and LDAP drivers. Others, such as the JDBC and Delimited Text drivers are bundled into Integration Module groups that, from a licensing perspective, cost extra.



I have been working on an IDM project where there is a requirement to supply CSV type files from the Identity Vault for daily upload to a VLE Portal. The obvious choice for this was the Delimited Text driver. However, the additional costs of Integration Module activation got me thinking about alternatives.



Was there another way to meet the requirement without using the Delimited Text driver?



My thoughts turned to a personal favourite, ECMA. First launched with IDM v3.5, scripting allows you to extend the capabilities of DirXML Script in IDM by writing your own functions that are accessible from within Policy Rules by way of XPATH. So with ECMA recruited to handle file writing duties, we are left with data acquisition and scheduling.




For those of you who are new to ECMA scripting, take a look at a previous article I have posted Using ECMA Scripts in IDM Policy Rules that illustrates their use.



Whilst I'm on the subject, when researching and writing this article, the fantastically prolific Geoff Carmen posted a Feature Article: Open Call for useful ECMA functions to use in Identity Manager that has some more examples. In reading this article, you will find reference to function for writing out to a log file submitted by Nathan Spears. This bears a striking resemblance to the ECMA function described in this article, which steals-my-thunder somewhat, but just goes to show you can't keep a good script down!




I have to confess that I have never taken the time to explore the Null, Loopback and Manual Task drivers fully, but I selected the Null Service driver for the task in hand and found it was the perfect candidate for data acquisition and scheduling as described in this solution.



This article examines a configuration that uses of the Null service driver, Jobs and some ECMA script to generate a CSV export file, a task for which the Delimited Text driver might normally be considered.



In situations where budgets are stretched and there is no other requirement to deploy the Delimited Text driver then this solution may be of use.



Environment



This solution was developed and tested on a VMWare SLES10 SP1 instance with eDirectory v8.8x and IDM v3.5.1.



The Null Driver Configuration



Like the Loopback driver, the Null driver acts on the Identity Vault itself. Rather than send data to a connected system in response to events in the Identity Vault, events are present in the Subscriber channel and can be detected and acted upon in the normal way, but that's as far as they go however. The Subscriber channel is not connected to the Publisher channel as is the case with the Loopback driver. With the Null driver nothing comes back on the Publisher channel except heartbeat messages.



A typical application for the Null driver might be email notification in response to certain events in the Identity Vault.



In this example we are going to use the Null driver in combination with a Job to generate a trigger document rather than respond to a broad spectrum of events from the Identity Vault.



Event Policy



Because we are not interested in any events other than triggers, a rule in the Subscriber Event Policy vetoes all events except triggers. This will minimise unnecessary processing.



The code for this shown below:




<rule>
<description>Only Pass Trigger Events</description>
<comment xml:space="preserve">Only allow trigger documents to pass</comment>
<conditions>
<and>
<if-operation mode="case" op="not-equal">trigger</if-operation>
</and>
</conditions>
<actions>
<do-veto/>
</actions>
</rule>




Command Policy



All the 'heavy lifting' in this solution occurs in the Subscriber Command Policy described in the following subsections. No other Policies are required. The complete DirXML Script is shown in the code block that follows and then some more detailed explanations are provided for specific code sections.




<rule>
<description>User: Create CSV Export</description>
<conditions>
<and>
<if-operation mode="case" op="equal">trigger</if-operation>
<if-xpath op="true">contains(@source, "Test Trigger")</if-xpath>
</and>
</conditions>
<actions>
<do-set-local-variable name="LV_Users" scope="policy">
<arg-node-set>
<token-query class-name="User" datastore="src" max-result-count="200">
<arg-dn>
<token-text xml:space="preserve">IDV\USERS</token-text>
</arg-dn>
<arg-string>
<token-text xml:space="preserve">Given Name</token-text>
</arg-string>
<arg-string>
<token-text xml:space="preserve">Surname</token-text>
</arg-string>
<arg-string>
<token-text xml:space="preserve">Internet EMail Address</token-text>
</arg-string>
</token-query>
</arg-node-set>
</do-set-local-variable>
<do-set-local-variable name="LV_OutputHeader" scope="policy">
<arg-string>
<token-text xml:space="preserve">Given Name,Surname,Email</token-text>
<token-char value="10"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="LV_NodeCount" scope="policy">
<arg-string>
<token-text xml:space="preserve">0</token-text>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="LV_Test" scope="policy">
<arg-string>
<token-xpath expression="es:fileWrite( $GCV_USERS_STUDENTS, $LV_OutputHeader)"/>
</arg-string>
</do-set-local-variable>
<do-for-each>
<arg-node-set>
<token-local-variable name="LV_Users"/>
</arg-node-set>
<arg-actions>
<do-set-local-variable name="LV_OutputString" scope="policy">
<arg-string>
<token-xpath expression="$current-node/attr[@attr-name='Given Name']/value/text()"/>
<token-global-variable name="CSVDelimiter"/>
<token-xpath expression="$current-node/attr[@attr-name='Surname']/value/text()"/>
<token-global-variable name="CSVDelimiter"/>
<token-xpath expression="$current-node/attr[@attr-name='Internet EMail Address']/value/text()"/>
<token-char value="10"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="LV_Test" scope="policy">
<arg-string>
<token-xpath expression="es:fileWrite( $GCV_USERS_STUDENTS, $LV_OutputString)"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="LV_NodeCount" scope="policy">
<arg-string>
<token-xpath expression="$LV_NodeCount 1"/>
</arg-string>
</do-set-local-variable>
</arg-actions>
</do-for-each>
<do-set-local-variable name="LV_OutputFooter" scope="policy">
<arg-string>
<token-text xml:space="preserve">***FileFooter</token-text>
<token-global-variable name="CSVDelimiter"/>
<token-local-variable name="LV_NodeCount"/>
<token-global-variable name="CSVDelimiter"/>
<token-time format="HH:mm:ss MM/dd/yyyy" lang="en-GB" tz="GMT0"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="LV_Test" scope="policy">
<arg-string>
<token-xpath expression="es:fileWrite( $GCV_USERS_STUDENTS, $LV_OutputFooter)"/>
</arg-string>
</do-set-local-variable>
<do-veto/>
</actions>
</rule>



Condition Block



A simple set of conditions are defined, as illustrated in the code snippet below.




<conditions>
<and>
<if-operation mode="nocase" op="equal">trigger</if-operation>
<if-xpath op="true">contains(@source, "Test Trigger")</if-xpath>
</and>
</conditions>



As can be seen, there are two conditions to satisfy. The first checks again that the document in the channel is a trigger document. Because of the Event policy described earlier, no other documents should make it this far, but it doesn't hurt to check.



The second condition checks the name of the trigger document is the correct one, 'Test Trigger' in this example.




One of the useful things about triggers is that they have a name. It is possible to set up multiple triggers in the driver and have different rules apply, depending on which trigger is being processed. This allows lots of different jobs to be handled by one driver instance. Each job can trigger a different policy rule allowing multiple CSV files to be generated.


So, we have detected the required trigger, what are we going to do about it?



Action Block



The first Action sets a node set local variable LV_Users to be the result set for a query against the Identity Vault in a context of IDV\USERS. You can see that I have selected a number of user attributes for illustration purposes. Any attributes could be selected here depending on the requirement. You will obviously set the context of the query appropriately for your Identity Vault




<do-set-local-variable name="LV_Users" scope="policy">
<arg-node-set>
<token-query class-name="User" datastore="src" max-result-count="200">
<arg-dn>
<token-text xml:space="preserve">IDV\USERS</token-text>
</arg-dn>
<arg-string>
<token-text xml:space="preserve">Given Name</token-text>
</arg-string>
<arg-string>
<token-text xml:space="preserve">Surname</token-text>
</arg-string>
<arg-string>
<token-text xml:space="preserve">Internet EMail Address</token-text>
</arg-string>
</token-query>
</arg-node-set>
</do-set-local-variable>



The second action constructs the CSV headers as a string local variable LV_OutputHeader. Obviously, these correspond with the attributes selected in the query.




Note that the string ends with a carriage return - char(10) - at the end.



<do-set-local-variable name="LV_OutputHeader" scope="policy">
<arg-string>
<token-text xml:space="preserve">Given Name,Surname,Email</token-text>
<token-char value="10"/>
</arg-string>
</do-set-local-variable>



A local variable LV_NodeCount is initialised with the value of 0. This variable is going to keep track of the total number of records returned by the query.




<do-set-local-variable name="LV_NodeCount" scope="policy">
<arg-string>
<token-text xml:space="preserve">0</token-text>
</arg-string>
</do-set-local-variable>



We are now writing the value we set earlier into the variable LV_OutputHeader out to a file using a call to an ECMA function. I explore this in detail later so I won't dwell on it now. The global variable GCV_USERS holds the file path to be written.




On my Linux installation I set the value of GCV_USERS to be /var/tmp/export.csv. You can obviously set this path to be anything you like.



<do-set-local-variable name="LV_Test" scope="policy">
<arg-string>
<token-xpath expression="es:fileWrite( $GCV_USERS, $LV_OutputHeader)"/>
</arg-string>
</do-set-local-variable>




The next block of code is a loop that steps through the node set variable LV_Users that was created in the first action and constructs the CSV rows by concatenating the value of the attributes in each node with comma separation and a trailing carriage return. Included in the loop is a call to the ECMA function to write out the CSV record data and to increment the counter variable.




I created a GCV called CSVDelimiter to hold the delimiter character so that alternatives could easily be set if required.




<do-for-each>
<arg-node-set>
<token-local-variable name="LV_Users"/>
</arg-node-set>
<arg-actions>
<do-set-local-variable name="LV_OutputString" scope="policy">
<arg-string>
<token-xpath expression="$current-node/attr[@attr-name='Given Name']/value/text()"/>
<token-global-variable name="CSVDelimiter"/>
<token-xpath expression="$current-node/attr[@attr-name='Surname']/value/text()"/>
<token-global-variable name="CSVDelimiter"/>
<token-xpath expression="$current-node/attr[@attr-name='Internet EMail Address']/value/text()"/>
<token-char value="10"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="LV_Test" scope="policy">
<arg-string>
<token-xpath expression="es:fileWrite( $GCV_USERS_STUDENTS, $LV_OutputString)"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="LV_NodeCount" scope="policy">
<arg-string>
<token-xpath expression="$LV_NodeCount 1"/>
</arg-string>
</do-set-local-variable>
</arg-actions>
</do-for-each>



Finally in the last block of code a footer for the CSV file is written. This is where the variable LV_NodeCount is used. You can see that LV_OutputFoooter is the concatenation of three comma separated elements. This is once again written to the CSV file using the ECMA Script function.




<do-set-local-variable name="LV_OutputFooter" scope="policy">
<arg-string>
<token-text xml:space="preserve">***FileFooter</token-text>
<token-global-variable name="CSVDelimiter"/>
<token-local-variable name="LV_NodeCount"/>
<token-global-variable name="CSVDelimiter"/>
<token-time format="HH:mm:ss MM/dd/yyyy" lang="en-GB" tz="GMT0"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="LV_Test" scope="policy">
<arg-string>
<token-xpath expression="es:fileWrite( $GCV_USERS, $LV_OutputFooter)"/>
</arg-string>
</do-set-local-variable>
<do-veto/>



So there you have it. What you get out is a CSV file with a field header, the output of the query as records and a footer that contains a count of the number of records and a date stamp. Now lets look at the ECMA Script that handles the writing.



ECMA



Novell introduced support for ECMA scripts with IDM v3.5. Scripts are contained in library objects that can be created globally within a driver set, or more locally within a specific driver. ECMA functions are created within the library object and called using XPATH from within policy rules.



The ECMA Script function fileWrite() used to write to the file system is shown below.




importPackage(Packages.java.io);

function fileWrite(filepath, data){

var error_code = 0, out;

try{
out = new java.io.BufferedWriter(new java.io.FileWriter(filepath,true));
out.write(data);
}
catch(e){
error_code = e;
}
finally{
out.close();
}

return error_code;
}



The first line of code in the script tells the system that we wish to make calls to the library java.io.



The function fileWrite() accepts two parameters.



  • filepath holds the full path of the file to be written. If the file doesn't already exist, it is created. The value is stored as a global variable.


  • data contains the string format data to be written out to the file.



The first line of code in the function declares the variables. In the case of error_code this is also initialised.



The main elements of the function occur within a try and catch construct.




This structure is very useful and supported in many modern programing languages. This works such that should any statements that are within the try fail, rather than cause a catastrophic stop in the code, any exception is caught and code execution may continue.


First, a new instance of the java.io.FileWriter object is initialised, with the variable filepath passed as a parameter. You will remember that this was stored in a GCV. You will also note that there is the boolean value true passed as an additional parameter to the FileWriter call. This sets it to append mode, so make sure you move the CSV file created away from its configured location or it will be added to every time the Job triggers the rule!



This new object is in turn passed as a parameter when initialising an instance of java.io.BufferedWriter. Lastly the write() method of the BufferedWriter object instance we created in the previous line of code is invoked with the data variable passed as a parameter. You will remember that data holds the CSV formatted string that was created in the Command Policy Rule detailed earlier.



This function will return a 0 to the calling XPATH statement if all is well, or the error caught if it suffers a failure.



Subscriber Channel Trigger



In order to invoke the Command Policy Rule described in earlier sections, a Job object configured as a Subscriber Channel Trigger is used.



Jobs of this type can be thought of as cron jobs for IDM. You configure the date and time you wish the Job to occur and at that time a trigger document is produced. Fantastic!



Rather than repeat what has gone before, Geoff Carmen (him again!) wrote an article on Jobs, focusing specifically on the Subscriber Channel Triggers, so if you are unfamiliar take a look at Identity Manager new feature: Jobs for a step by step guide on how to configure this type of job.



Scalability



IDM is a JAVA application, as are the drivers for the most part and therefore there are the common constraints imposed on applications running on a JRE.



Having read a number of articles on the subject, consensus has it that the theoretical memory limits centre around the 2Gb mark.



This is a space occupied by eDirectory, the IDM engine and any deployed drivers, so it is likely that in a practical deployment, considerable care should be taken to ensure driver designs do not consume too much resource.



In early versions of this solution, I didn't limit the number of records returned by the query. When testing with an IDV containing 10,000 user objects and retrieving about 8 attributes for each user, I was rewarded with a stack overflow!



Once I had limited the results returned by the query to 200 or so, all was well and CSV files for this number of users were created in about 35 seconds.



It would be sensible when deploying this solution to carefully monitor memory usage within your particular environment and adjust how many records are returned by the query accordingly. Similarly the time set in the Job could be set 'out-of-hours' when activity and resource utilisation by other drivers is low.



This solution has been tested on an Identity Vault with just over 20,000 user objects in it so I'm pretty confident it will scale well.



Conclusions



In IDM deployments where there is no other requirement to activate the Delimited Text driver, this solution provides a flexible and scaleable method for generating CSV data exports from the Identity Vault.

Labels:

How To-Best Practice
Comment List
  • Nice solution !! Thanks.

    Wish to know whether we can use the same solution to write the file to an external drive ( NAS drive). Tried providing the UNC path of that NAS drive in GCV. But that didn't help. But pointing to a local drive works. Suggestions please.
  • Well written, and easy to follow thanks to the explenation of each part.
    I have some concerns about memory usage though and, if using the delimited text driver would make a difference for creating large reports.
Related
Recommended