Getting Started Building a SOAP Driver for IDM - Part 9

1 Likes
Starting A SOAP Driver for IDM Part 9:

Novell Identity Manager comes with a bunch of prebuilt and out of the box drivers that mostly do what is needed for most cases. However some drivers allow for so much flexibility that no out of the box configuration will ever be complete. The JDBC driver, which can connect to many different databases comes with some of the big ones (Oracle, Microsoft SQL Server) configured, but the rest are sort of up to you. Mostly because almost everyone uses databases differently. Novell has talked about a fan out configuration for the JDBC driver that will come out after the formal release of Identity Manager 4.0. That will probably manage the case of out of the box database models, like using Oracle with 'Oracle Users' in which case, you can imagine a model fairly easily where the driver could be set up to push users into many dozens of such Oracle databases that all use the same basic model of users, really differing only in host information (DB server, port, database name, etc) and using entitlements to specify which database the user gets access too.

But the SOAP driver is even harder to provide useful default configurations. Well it ships with a DSML and SPML 1.0 configuration, since those are the only really mature standards for SOAP operations involving users. But SOAP is basically as open ended as you want it to be, and everyone does whatever they want.

For example, the User Application is more about Provisioning than user events directly, but if you really wanted too, you could use the SOAP driver to talk to the User Application. (Actually, it is almost as much fun to use a Workflow, to use a SOAP integration activity, to talk to User Application to do stuff.)

With that said, there are some big targets for SOAP that you could develop configurations for. I started this series to try and provide notions on how you might do that, using Salesforce.com as the target, since it is a pretty big target to aim at. However the concepts involved would be much the same that would be used to try and target any other SOAP system. Heck I used the ideas I developed here in my SOAP integration activity in a Workflow to call User Application web services. Sounds silly but a Provisioning Request Definition (PRD) that cancels another PRD can be quite useful! (There is a Start Workflow token, but there is no Stop workflow token. So to make a Stop workflow token, you use Start Workflow to call a Workflow or PRD that stops the specified workflow. Same thing to solve the issue that there is no Approve Workflow token, can be done with a Start Workflow of a PRD that Approves a running workflow). I need to find the time to write an article about that whole concept.

In part 1 of this series, Getting Started Building a SOAP Driver for IDM - Part 1 I discussed some of the things you need to get started building a SOAP driver. I was using the example of Salesforce.com (henceforth known as SFDC, since typing the full name is too much of a pain each time). In Part 1 I focused on how you might start connecting via SOAP to get a session ID.

In Part 2 of this series Getting Started Building a SOAP Driver for IDM - Part 2 I discussed how you might process the results from SFDC after you submit a login request, and converting it into an <instance> document.

In Part 3 of this series Getting Started Building a SOAP Driver for IDM - Part 3 I discussed how you might handle query events and their responses.

In Part 4 of this series Getting Started Building a SOAP Driver for IDM - Part 4 I talked about some of the background stuff you need to manage, like attribute syntaxes, and left hanging two more concepts. Subscriber channel write events, like <add> or <modify> events, that need to be sent to SFDC, and the ability to get events onto the Publisher channel.

In Part 5 of this series Getting Started Building a SOAP Driver for IDM - Part 5 I started talking about how you would map add and modify events from Identity Manager into SFDC events. This would allow you to write changes (modify events) back to SFDC, or add new users to SFDC.

I started talking about how you would handle modify events, and left add events as an exercise. However I did not finish the modify discussion. I showed some sample code, to manage it, but I would like to discuss the actual process that the code sample uses.

In Part 6 of this series Getting Started Building a SOAP Driver for IDM - Part 6 I finished talking about how to handle modify events. However add events were not addressed.

In Part 7 of this series Getting Started Building a SOAP Driver for IDM - Part 7 I discussed the issues you would need to address to handle <add> events, even though I did not actually implement it in practice. I also discussed that Novell has released Identity Manager 4.0 Advanced Edition, which has an integration module available for Salesforce.com.

In Part 8 of this series Getting Started Building a SOAP Driver for IDM - Part 8 I began talking about how you can work to enable the Publisher channel and retrieve updated objects as events from SFDC. This is the key feature that the shipping Novell SFDC driver (as part of IDM 4) is missing. This started explaining how to use the getUpdated() function.

Where we left off in part 8, was after talking about the getUpdated() call, and describing how to handle getting the current time from SFDC with the getServerTimestamp call was what do you do next? The getUpdated() call requires three things. A start time, and end time, and a object class (sObjectType). Well as we discussed, start time is the last time we successfully ran, which we store on the driver object in the Last Referenced Time attribute, so on a driver restart we have a record of where we left off. The end time we get, by asking SFDC to getServerTimestamp, in other words, what time was SFDC think it is right now. Finally we need to decide which object classes.

As I had discussed in an earlier section (Part 3 of this series), regarding how I handled Query events, I used a mapping table to store all the object classes, and what a SELECT * would imply in terms of all attributes. At the time I mentioned as well that I probably should have parsed through the Schema Map or Filter to get this information, instead of using a mapping table.

However, once I had a working populated mapping table, it was pretty easy to just add a column, I called counter, that just has a number, starting at one, and incrementing sequentially for each object class line.

Then initialize a counter (set local variable to XPATH of number(1) ) then start a while loop, where the condition is if the result of the mapping table call is XXYY then terminate. Inside the loop, increment the counter with set local variable to XPATH of $COUNTER 1 and use the COUNTER local variable for the counter column, as the source column in the Map token. With the object class being the destination column.

You can read more about this approach in a two part article I wrote about round robin support for a series of post office in a GroupWise driver.
Load Balancing User Placement in GroupWise Post Offices - Part 1
Load Balancing User Placement in GroupWise Post Offices - Part 2

Now that is actually the easiest part. The next bit is the interesting part. So where do you do you implement all this logic. Obvious place is in the Publisher Event Transformation policy set. However there are some interesting timing and policy interaction issues involved.

Initially, I figured I would take the getUpdated call, and send it into the Subscriber channel like I did with the login function discussed, in part 1 and 2 of this series. The problem with that approach is that the event gets queued into the cache, and you do not get an immediate response, and in fact it is very hard to process the event when it returns. I suppose if you added an operational property that carried information about why you started the process, you could handle it when it finally returns. However, that is way less than optimal.

Thus I continued down the approach I took with Query events, and just handled getUpdated as a special query case (class-name="getUpdated") and called it inside my while loop, once for each object class.

Now in the Input transform, I had a rule that converted the getUpdatedResult documents that SFDC returns, with just a list of database IDs as an <instance> doc, with a class-name of getUpdatedResponse. I would have put in the real object class, but I do not know what it is based solely on the result document.

However inside my loop, I know what the object class is, and then nest a for each loop inside. based on the results, so that for each <Id> node in the response, build a <sync> event, node by node, just like we did for building the SOAP documents needed in the Output transform, and for converting the SOAP documents into XDS in the Input Transform.. The src-dn is the database ID of course. As is the <association> nodes value. The class-name we know since we are inside the WHILE loop and it was used to call the getUpdtaed function.

I originally thought I could actually build a <modify> document here and have it come into the rest of the Event transform as if every attribute was changing, and then let the engine figure out what really changed via Optimize Modify. However, what I found was I was converting a query response document, which was nestled inside an <output> node, and the engine would let it into the Event transform, but not progress it past there, since it was really not needed to go further.

Thus I hit upon the <sync> event idea. By building a <sync> event in the Publisher Event transform the engine will process it further which is all we really need it to do. This will only work in the Event transform, so that is one of the timing issues.

That worked for quite a while, but then every event was a from-merge event and I was running into issues with merge authority that I did not like at all.

To work around that, what I did was in the next Policy object in the Publisher Event Transform, after the <sync> event is generated, I then converted that to a <modify> event. To do that, you rebuild the node set node by node, but you also query for all attributes back into SFDC and converted the results into <modify-attr> nodes.

To make sure it works correctly, you need to not just add <add-value> nodes to the <modify> doc, but you should start each attribute with a <remove-all-values> since SFDC does not support multivalued attributes, any change is always a SET not an ADD event.

This seems a bit unwieldy, to query for getUpdated, convert the <instance> doc as a response into a <sync> event per returned database ID, and then covert it again into a <modify> event for each <sync> event, but it does work nicely.

Once you have it debugged, you definitely want to disable trace on as many of these rules as possible. Specifically any with loops, as you can hit some serious performance overhead in the engine, if it has to display all the work it is doing. Specifically since in a migrate you might have a thousand or two nodes that need to be converted.

I found that with 1800 returned Groups from SFDC, with 17 attributes, it was taking 20 minutes with trace off to convert that into a 1800 node <instance> doc for the migrate to proceed. The CPU (two allocated CPU's for the VM) was at 100% the whole time, and the engine was chugging away, but generating zero trace, so it looked hung.

I was able to speed that up immensely, by changing my query used in the migrate from requesting no specific attributes (and thus all attributes, all 17 of them, which were then thrown away by the conversion to a <sync> event which has no attributes listed, and then re-queried for later in the conversion to <modify>) to requesting just the database ID specifically, and processing time dropped to about four minutes, which was short enough that an impatient person would not restart the directory on you, thinking it was hung. With trace enabled for these rules, you can imagine how long it might take! The penalty can really be high, from 2-5 times as long!

This is one of those catch 22 situations. On the one hand, NTS says, do not run with trace on, it kills performance. Agreed, it does. But then how do you catch and fix issues that occur, without seeing what happened in trace?

The trade off I lean towards is to run with trace on, but disable trace on a rule by rule basis on really verbose slow stuff. Like all the conversions we do in this driver.

Well thats it for now, I think I should be able to wrap up the rest in the last part of this series.

Labels:

How To-Best Practice
Comment List
Related
Recommended