Parallel processing of subscriber channel events

Idea ID 2821883

Parallel processing of subscriber channel events

IDM process events in the subscriber channel 1 at a time.

For each event on the subscriber channel, the IDM run over all policies on the subscriber channel, and when it is done, it continue to the next event.

The following image describes the subscriber channel processing:

Single processing modeSingle processing mode

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

In environments where there are millions of events to the subscriber channel, the processing time is very long, and can take weeks.

I want to offer a solution that will make the subscriber channel faster.

It will be faster if it will be split into multiple identical subscriber channels, running the same logic. This feature will allow us to process multiple events in parallel - 1 event for each channel (thread\process).

The following image describes the offered solution:

Multiprocessing modeMultiprocessing mode

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

I know that it doesn't fit all needs, but one will be able to activate this solution based on a configuration of the number of simultaneous processing channels (default should be 1).

17 Comments
Vice Admiral
Vice Admiral

For some driver types, there exists a fan-out driver that works similar to this manner.  There also exists a priority sync channel in the shims now that allow for specific event types to have fastpasses (skip the queue) and get processed immediately.

The primary reason NetIQ hasn't implemented what you are referencing is because IDM is an event driven system and in some cases the different events must be processed in order.

For example, if an account is created, then has additional attributes added, followed by the setting of the universal password.  In your model, each of these three events would be processed in parallel and the modifies and password sync event would get tossed due to a missing association.  All 3 events in a typical creation event will be milliseconds apart.  

Many other IDM systems are reconciliation based, so they are able to process all records, then send all of the events as a snapshot.  Because it is a snapshot, not a live look, they are able to split into multiple threads.

We have implemented something similar, but the way it works is you have multiple connectors to the same system, then you have a logical divide in scoping.  For example, you can create one driver for usernames that start with A-M, then another for N-Z.  This will work if you do not have interdependencies on those user objects and would allow for you to specify which driver handles which events.  While all drivers will see the initial event, one of the first policies (subscriber event transform) will evaluate whether to veto the event or process it.  As a tip, to avoid keeping multiple copies of the same driver in sync, you could simply enable it on multiple servers in the tree, the GCV's and configuration parameters are server specific, so each server in your tree could be servicing a different subset.  Packages are also a great way of ensuring multiple drivers don't have their policies get out of sync.

Commodore
Commodore

I understand what you are saying that IDM is a triggered based system, and that events usually need to be processed in order, but I want to be able to decide if I'm allowing parallel processing or not, and let me handle to outcomes in the policies.

I understand what you did with creating multiple drivers that shares the same code, but it is not simple enough and hard to maintain (If I want 100 processes I need to think of a division function that divide the events for 100 drivers).

Knowledge Partner Knowledge Partner
Knowledge Partner

If you look at the parellization efforts they made in RRSD in 4.8 you can see a couple of different approaches.

They split the processing into multiple threads, for different types of operations. (Roles vs Resources vs users vs Dynamic groups (Which I think were a seperate thread in the past as well))

Then they support your adding an XML attr disjoint-set to the operation node, which allows it to use a distinct thread.  One per disjoint set value, but you have to select how to splt it.  Obvious example is by OUs. Failing that, can use A-Z to get 26 potential threads but that is not very balanced in real names.  And so on.

 

Knowledge Partner Knowledge Partner
Knowledge Partner

I've built something along those lines, running 12 drivers in parallel. You just have to be very careful to correctly scope the subscriber event transform so that one object is processed by exactly one driver.

Vice Admiral
Vice Admiral

@dgersic  @dgersic  we have now same challenge,  we have to process millions of the events on the subscriber, and many drivers just dont scale better.

 

what would be the better advice for that kind of environments any thoughts?

Example:  A groupMembership attribute is the event flooder and is subscribed by many drivers.

 

/Maqsood.

Knowledge Partner Knowledge Partner
Knowledge Partner

I can think of several ways to scope a driver so that you can run several subscribers in parallel. Can the destination system handle that level of traffic? Can you build or modify your design to allow for this?

The simplest is a scope by attribute value rule. Just use if-attr flag-attr not-equal "this driver" then veto() to discard the events not flagged for this driver.

If you can change the design of the vault, use replica placement to scope your drivers. If there is no replica on "this" server, then the driver running on "this" server sees no events.

You could also use vault design with if-src-dn not in subtree as a scoping mechanism.

If you can't modify the vault design, if source-name not-match "a*"  and etc. rules could be used to scope based on object name, assuming you have a roughly normal distribution of object names.

You can use the driver's security equal to and ACLs so that only one driver can actually process events for a specific object.

In the end, with no details on what you're trying to do, I can't very well tell you how to do it, or suggest which technique might be best. How about describing the system you have, and what you're trying to achieve with it?

 

Vice Admiral
Vice Admiral

anyone has implemented a idm engine in big environment of million of events and 200+ drivers (subscriber) can share their implementation how they could build scaleable  system while keeping IDM infrastructure as minimum as possible? Example on driverset(but many servers):

 

we see high performance penalty when all drivers subscribes on "User/groupMembership" attrbute. 

/Maqsood.

 

 

Micro Focus Expert
Micro Focus Expert

@maqsood,

Can you provide some more details on the type of events you are seeing the high performance penalty. Understanding your use case may allow some alternate options to be identified/considered. Things like where is the event originating from, (one central location vs many sources merged across all locations) how many users/groups/objects are involved, how many attribute values are involved. If this is a high number of events vs a high number of values being updated, that also has an impact on how to mange things. Wondering also why you have so many drivers subscribing to the user/groupMembership attribute. Is there something in your design that you can share to provide a better overview for discussion?

Cheers,

D

Vice Admiral
Vice Admiral

@dstagg  in our idm system most of the access is represented by edirectory group due to historical reasons and many reasons such as  easy to build ldap queries, easy visibility from imanager and support for different 3rd party ldap management tools , and using of groupMembership as ldap filter in external system (where our edirectory is used a external ldap directory), and eDirectory being more stable than userapp( in case of operation disruption or disaster).  we can not be dependent on userapp (identity application).

in order to grant/revoke access drivers subscribes to user/groupMembership attribute, some driver process groupMembership attribute, other driver just notifies to carry some loopback operations for management.

we also use RBE (roles based entitlement) in situations where we need to build  complex rules based on different ldap attributes in order to grant accesses.

we have also  some drivers uses entitlements granting agent as  RBE, but we see very hard to query entitlement attributes from ldap queries, so we map entitlement value as  role|resource and that turn back into  groupMembership (with highly automated proces)

User=>RBE=>MgmtDriver(Dirxml-EntitlementRef)(grant|revoke[ add | remove role| resource]=>MgmtDriver(nrfResourceAssigned|nrfRoleAssigned>groupMembership.

in userapp (identity application) each edir group is represented as "role" or  "resource" and created by IDM to userapp automatically whenever we create a new group in edirectory, we have a driver which automatically creates this as "role" or "resource" based on eDirectory ou container, sets owner, (sets delegate admin assigments as role | resource manager) in case owner of the role|resource need to add access manually,

we have very unstable | slow userapp experince and its UX is not good,  and its not used by end users, and used only for exception in some manual acceses, as our accesses are highly automated.

End result is the user gets groupMemerbship in eDirectory, and idm drivers carries action  grant|revoke based on the  group membership attribute. 

 

@dstagg  i know its hard to describe solution in details, is it possible to arrange a meeting or call to discuss this details?

/Maqsood.

 

 

 

 

 

 

 

Micro Focus Expert
Micro Focus Expert

@maqsood,

As much as a meeting or call might be helpful to discuss details, that is generally outside the scope of what I can offer through the community. At a high level this would appear to be a request that may be best addressed with an architectural review of your environment. That is usually best provided by Micro Focus or one of our partners for an appropriate fee. As you have asked here in the community, lets see if we (the community) can provide some general observations for your consideration and maybe some specific thoughts that you may find helpful.

Based on your description it sounds like you have some complex rules and processes to determine the effective group membership of individuals. That in itself should not be causing a performance issue as Identity Manager is very quick to process events generally. What you have described seems to be fairly common, though the scale of your environment may be different than others. The question then becomes what is causing the performance issues you are seeing.

  1. Are you seeing slowness process an event? Simply adding or removing a value on the groupMembership attribute for a user should be processed extremely quickly.
  2. Are you seeing hundreds or thousands of events occurring as a result of a single change? If so what is the change that is causing all the events to be generated?
  3. Are you seeing hundreds or thousands of values changing for a single user or group? If so what is the source of the change(s) for that/those?

If group membership is changing in high numbers, understanding how many changes in a period of time, why they are being changed at that rate and what is generating those changes, that will be helpful in determining possible options. Functionally any environment should be fairly stable - most of the time - unless there is some policy or procedure that is changing attributes/values on a regular basis. This may then become a  business issue vs a technical issue. I have a situation where access rights were being added every morning and then removed every evening and those procedures generated huge amounts of events. So understanding the type of changes and when they happen in your environment will be helpful.

In this discussion I believe that stepping back a bit to get a good feel for your use case(s) and the behaviors you have observed, will help guide responses faster at this time.

Cheers,

D

The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.