New Ranks & Badges For The Community!
Notice something different? The ranks and associated badges have gone "Star Fleet". See what they all mean HERE

Parallel processing of subscriber channel events

Idea ID 2821883

Parallel processing of subscriber channel events

IDM process events in the subscriber channel 1 at a time.

For each event on the subscriber channel, the IDM run over all policies on the subscriber channel, and when it is done, it continue to the next event.

The following image describes the subscriber channel processing:

Single processing modeSingle processing mode

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In environments where there are millions of events to the subscriber channel, the processing time is very long, and can take weeks.

I want to offer a solution that will make the subscriber channel faster.

It will be faster if it will be split into multiple identical subscriber channels, running the same logic. This feature will allow us to process multiple events in parallel - 1 event for each channel (thread\process).

The following image describes the offered solution:

Multiprocessing modeMultiprocessing mode

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I know that it doesn't fit all needs, but one will be able to activate this solution based on a configuration of the number of simultaneous processing channels (default should be 1).

6 Comments
Commodore
Commodore

For some driver types, there exists a fan-out driver that works similar to this manner.  There also exists a priority sync channel in the shims now that allow for specific event types to have fastpasses (skip the queue) and get processed immediately.

The primary reason NetIQ hasn't implemented what you are referencing is because IDM is an event driven system and in some cases the different events must be processed in order.

For example, if an account is created, then has additional attributes added, followed by the setting of the universal password.  In your model, each of these three events would be processed in parallel and the modifies and password sync event would get tossed due to a missing association.  All 3 events in a typical creation event will be milliseconds apart.  

Many other IDM systems are reconciliation based, so they are able to process all records, then send all of the events as a snapshot.  Because it is a snapshot, not a live look, they are able to split into multiple threads.

We have implemented something similar, but the way it works is you have multiple connectors to the same system, then you have a logical divide in scoping.  For example, you can create one driver for usernames that start with A-M, then another for N-Z.  This will work if you do not have interdependencies on those user objects and would allow for you to specify which driver handles which events.  While all drivers will see the initial event, one of the first policies (subscriber event transform) will evaluate whether to veto the event or process it.  As a tip, to avoid keeping multiple copies of the same driver in sync, you could simply enable it on multiple servers in the tree, the GCV's and configuration parameters are server specific, so each server in your tree could be servicing a different subset.  Packages are also a great way of ensuring multiple drivers don't have their policies get out of sync.

Commodore
Commodore

I understand what you are saying that IDM is a triggered based system, and that events usually need to be processed in order, but I want to be able to decide if I'm allowing parallel processing or not, and let me handle to outcomes in the policies.

I understand what you did with creating multiple drivers that shares the same code, but it is not simple enough and hard to maintain (If I want 100 processes I need to think of a division function that divide the events for 100 drivers).

Knowledge Partner Knowledge Partner
Knowledge Partner

If you look at the parellization efforts they made in RRSD in 4.8 you can see a couple of different approaches.

They split the processing into multiple threads, for different types of operations. (Roles vs Resources vs users vs Dynamic groups (Which I think were a seperate thread in the past as well))

Then they support your adding an XML attr disjoint-set to the operation node, which allows it to use a distinct thread.  One per disjoint set value, but you have to select how to splt it.  Obvious example is by OUs. Failing that, can use A-Z to get 26 potential threads but that is not very balanced in real names.  And so on.

 

Knowledge Partner Knowledge Partner
Knowledge Partner

I've built something along those lines, running 12 drivers in parallel. You just have to be very careful to correctly scope the subscriber event transform so that one object is processed by exactly one driver.

Commodore
Commodore

@dgersic  @dgersic  we have now same challenge,  we have to process millions of the events on the subscriber, and many drivers just dont scale better.

 

what would be the better advice for that kind of environments any thoughts?

Example:  A groupMembership attribute is the event flooder and is subscribed by many drivers.

 

/Maqsood.

Knowledge Partner Knowledge Partner
Knowledge Partner

I can think of several ways to scope a driver so that you can run several subscribers in parallel. Can the destination system handle that level of traffic? Can you build or modify your design to allow for this?

The simplest is a scope by attribute value rule. Just use if-attr flag-attr not-equal "this driver" then veto() to discard the events not flagged for this driver.

If you can change the design of the vault, use replica placement to scope your drivers. If there is no replica on "this" server, then the driver running on "this" server sees no events.

You could also use vault design with if-src-dn not in subtree as a scoping mechanism.

If you can't modify the vault design, if source-name not-match "a*"  and etc. rules could be used to scope based on object name, assuming you have a roughly normal distribution of object names.

You can use the driver's security equal to and ACLs so that only one driver can actually process events for a specific object.

In the end, with no details on what you're trying to do, I can't very well tell you how to do it, or suggest which technique might be best. How about describing the system you have, and what you're trying to achieve with it?

 

The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.