How to handle parallel processing

How to handle parallel processing

As IDM is multithreaded, it allows multiple drivers to process at the same time. In essence, it means that race-conditions can happen.

IDM is event based, that is, something changes and this causes something to happen. For example a driver will process input which causes a modification which triggers an event. This event can (and mostly does) cause another driver to do something with the object which was modified. But the main issue, is that we at this point in time do not know if there are more modifications to this object which need to be processed, before we can allow other drivers to act on them.

This is mostly seen with drivers which are based on the delimited-text driver, like SOAP, SAP, etc., as they do batch-processing. Normally input is delivered in files or streams which contain thousands of modifications, and the input mostly is not ordered. The first line might add an object, and then a few thousand lines later one might get another modification on the same object.

Normally this does not cause any problems, as Driver A will receive a modification to an object, then modify it. Then Driver B, C, D, will sync these modifications to other systems.

But once in a while there will be situations where one needs to be able to control the situation.

For example:

User1, new email
User2, new last name
User3, new location
User1, new last name
User2, new email
User1, new location
user3, new email
User1, new location

All of these will modify the object, and one, maybe more can trigger a move, either in the vault, or maybe in one or more connected systems.

With the small example above, there will never be an issue, but if there are thousands of lines in the input file (not uncommon to see +250MB files), then from the first modification to the last there will be enough time for another driver to pick up the first modification before the last modification has been done. And in these cases we would like that nothing happens to the object before we are finished.

It is especially moves which worry people. But also when File System Management is in place where home directories are moved automatically depending on location, then one does not want to move a home directory more than once, or not at all if a mistake has happened.

To prevent unwanted operations to happen while the input document is being processed, especially anything which can trigger a move, either in the vault, or in any of the connected systems, one have to introduce a "blokker" which causes all other drivers to ignore all events which are caused by the input until processing is finished.

The solution is to add code which acts as a semaphore, and tells other processes (drivers) that they are not allowed to do anything with the object until the semaphore is released (http://en.wikipedia.org/wiki/Semaphore_(programming)).

It does add more modifications to the process, and also a small overhead, but in the bigger scheme of things this is better than having to wait for everything to settle down after users being moved by accident.

As an example I am using the delimited-text driver, where I add code which adds an attribute to all objects which are being modified or added, and then when I am done with the whole input document I remove the attribute again, which releases it to be processed by other drivers.

To the input-transformation we need to add code which does:

  1. find the number of entries in the document

  2. add two entries to operational data:

    • number of entries in the document

    • a counter which is the operation number in the document




First we need to setup to local variables, as dirxml script cannot handle setting a variable (counter) in the same rule as where a for-each is we need to do this in two rules, and the variables needs to be global;
	<rule>
<description>setVariables</description>
<conditions>
<and/>
</conditions>
<actions>
<do-set-local-variable name="lv_counter" scope="driver">
<arg-string>
<token-xpath expression="number(0)"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="lv_inputDocCount" scope="driver">
<arg-string>
<token-xpath expression="count(//input/*) + 1"/>
</arg-string>
</do-set-local-variable>
</actions>
</rule>

Then we add the operational data to the input document, using for-each to loop through the whole document will do this;
	<rule>
<description>addOperationData</description>
<conditions>
<and/>
</conditions>
<actions>
<do-for-each>
<arg-node-set>
<token-text xml:space="preserve">//input/*</token-text>
</arg-node-set>
<arg-actions>
<do-set-local-variable name="lv_counter" scope="driver">
<arg-string>
<token-xpath expression="number($lv_counter) + 1"/>
</arg-string>
</do-set-local-variable>
<do-set-op-property name="opCounter">
<arg-string>
<token-text xml:space="preserve">$lv_counter$</token-text>
</arg-string>
</do-set-op-property>
<do-set-op-property name="opCount">
<arg-string>
<token-text xml:space="preserve">$lv_inputDocCount$</token-text>
</arg-string>
</do-set-op-property>
</arg-actions>
</do-for-each>
</actions>
</rule>

That will allow us to know where we are in the document when it is being processed.

Then we need to add a "stopper" to the object if it's being added/modified, especially if it's being modified, but I've see that some people create objects in temporary containers and then move them when they have been created, so remember anything which can trigger another driver has to be caught:


    1. create an aux class which can be added to the user, in my example I've created

      • class: doNotTouch

        • attr: doNotTouch





    2. decide on an value, or maybe more. In my example I've decided to use "BLAST".



	<rule>
<description>add-doNotTouch</description>
<conditions>
<and/>
</conditions>
<actions>
<do-add-dest-attr-value class-name="doNotTouch" name="doNotTouch">
<arg-value type="string">
<token-text xml:space="preserve">BLAST</token-text>
</arg-value>
</do-add-dest-attr-value>
</actions>
</rule>


  1. then in every driver add the following to the 'event-transformation':


if attr doNotTouch = 'BLAST'
veto

	<rule>
<description>stopIt</description>
<conditions>
<and>
<if-attr mode="nocase" name="doNotTouch" op="equal">BLAST</if-attr>
</and>
</conditions>
<actions>
<do-trace-message>
<arg-string>
<token-text xml:space="preserve">OBJECT is locked</token-text>
</arg-string>
</do-trace-message>
<do-veto/>
</actions>
</rule>

As we are clearing out the 'doNotTouch' later we will get the events, but it also means that we can no longer use code like;
if attr 'Last Name' changing ....

As we will only get one event, which is from the 'doNotTouch' changing.

Then we need to add code to the output-transformation which will see that the number of processes entries is equal to the number of entries in the document, which we will get from the output document status part. Also it is possible to add more information which can help us, but I am keeping this as simple as possible;

  1. check for status equal success, and number of entries is equal number of processed entries

  2. find all object which have 'doNotTouch' equal 'BLAST'

  3. for each of these object remove 'doNotTouch':'BLAST'


This will allow other drivers to start processing the objects.
	<rule>
<description>CleanUp</description>
<conditions>
<and>
<if-operation mode="case" op="equal">status</if-operation>
</and>
</conditions>
<actions>
<do-set-local-variable name="lv_opCount" scope="policy">
<arg-string>
<token-op-property name="opCount"/>
</arg-string>
</do-set-local-variable>
<do-set-local-variable name="lv_opCounter" scope="policy">
<arg-string>
<token-op-property name="opCounter"/>
</arg-string>
</do-set-local-variable>
<do-if>
<arg-conditions>
<and>
<if-xpath op="true">$lv_opCount = $lv_opCounter</if-xpath>
</and>
</arg-conditions>
<arg-actions>
<do-set-local-variable name="lv_foundObjects" scope="policy">
<arg-node-set>
<token-query datastore="src">
<arg-match-attr name="doNotTouch">
<arg-value type="string">
<token-text xml:space="preserve">BLAST</token-text>
</arg-value>
</arg-match-attr>
</token-query>
</arg-node-set>
</do-set-local-variable>
<do-for-each>
<arg-node-set>
<token-local-variable name="lv_foundObjects"/>
</arg-node-set>
<arg-actions>
<do-set-local-variable name="lv_object" scope="policy">
<arg-string>
<token-xpath expression="$current-node/@src-dn"/>
</arg-string>
</do-set-local-variable>
<do-remove-src-attr-value name="doNotTouch">
<arg-dn>
<token-text xml:space="preserve">$lv_object$</token-text>
</arg-dn>
<arg-value type="string">
<token-text xml:space="preserve">BLAST</token-text>
</arg-value>
</do-remove-src-attr-value>
</arg-actions>
</do-for-each>
</arg-actions>
<arg-actions/>
</do-if>
</actions>
</rule>

The above code will only remove the attribute value of 'BLAST' from the objects, if more than one value is used, consider using '<remove-all-values>' instead.

This is one way of stopping parallel processing, there might be more and simpler ways of doing this, but in general if the input can contain more than one modification to an object or if there is a chance for multiple moves being triggered it should be considered to look at these things.
Labels (1)

DISCLAIMER:

Some content on Community Tips & Information pages is not officially supported by Micro Focus. Please refer to our Terms of Use for more detail.
Top Contributors
Version history
Revision #:
4 of 4
Last update:
‎2020-03-10 19:13
Updated by:
 
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.