Bidirectional eDirectory driver - Dealing with a strange error

1 Likes
Bidirectional eDirectory driver, String index out of range error.

The new Bidirectional eDirectory driver is an interesting new thing that came with the IDM 4.02 release. Up until now, the eDirectory driver was actually the hardest driver of them all, since it really was two drivers, with half in each tree, but it was the inbound, Publisher channel that was really doing the work in each half driver.

This confuses the heck out of most sane people, and compared to an Active Directory or LDAP driver this was a real pain to work with. The new Bidirectional driver removes a lot of those issues, and folds it down to basically a single driver based off the LDAP driver, and a ChangeLog module for the remote eDirectory tree. Just like the LDAP driver can register with existing changelog modules in various LDAP services (M-Vault, Sun Ones etc).

If you are running this driver, and you are not at the IDM 4.02 Engine patch 4 level (Not released as I write this, but my bugs have been confirmed to have fixes released in IDM 4.0.2.4) then make sure to get the patches from support. But I am sure this will all be moot by the time you read this.

I had reviewed the packages from the earlier releases of the driver, which have since changed, and I should probably find the time to go look at what has changed. (But time is one of those things we often do not have enough of...).

You can read that series here: Walking through the Bidirectional eDirectory Driver

I was working with this driver recently and ran into an issue, that took a fair bit of playing around to finally get down to the root cause, which was quite interesting, and I thought would be worth sharing.

When we started up the driver, we were getting a fatal error, that shut it down right away that looked like this is trace.

[11/20/13 18:13:54.086]:eDir Sync PT:eDir Sync: EdirPublisher - Initiating agent registration...
[11/20/13 18:13:54.090]:eDir Sync PT:eDir Sync: OpenLDAPConnection - Connect to the server
[11/20/13 18:13:54.091]:eDir Sync PT:eDir Sync: Opening SSL connection
[11/20/13 18:13:54.091]:eDir Sync PT:eDir Sync: LDAPInterface.registerDriverInstance() : Exception occurred while registration - String index out of range: -1
[11/20/13 18:13:54.092]:eDir Sync PT:PublicationShim.start() returned:
[11/20/13 18:13:54.093]:eDir Sync PT:
<nds dtdversion="4.0">
<source>
<product instance="eDir Sync" version="4.0.1.0">Identity Manager Bi-directional Driver for eDirectory</product>
<contact>Novell, Inc.</contact>
</source>
<output>
<status level="fatal">String index out of range: -1</status>
</output>
</nds>


So why are we seeing this error message in the log?

<status level="fatal">String index out of range: -1</status>


Well just before the first instance of that error, that we see in the trace:

[11/20/13 18:13:54.086]:eDir Sync PT:eDir Sync:  WARNING !!! No driver filter found... Driver may not start.


Why no filter? We confirmed that it existed and looked appropriate. There is a known bug that is resolved in a driver patch, where the filter that the Changelog module uses is basically reversed channels. It makes a fair bit of sense after all. The Changelog module is basically dxevent from the IDM engine, bundled up, I assume with minimal code changes to make maintenance easier (if they fix a bug in dxevent for the engine, easy enough to port it over to Changelog as well this way). Well, from the engine perspective, the Subscriber channel is what it uses, when a change in eDirectory occurs and it needs to send it out. But from the perspective of the Changelog in use with this driver, that should really be the attributes in the filter on the Publisher channel. Thus if your filter is not Sync in both directions for each attribute you might get odd results. This is fixed, and will be released in the next engine path (IDM 4.02 Patch 4 at the time of this writing)

This driver has an interesting twist, which makes it all the more confusing at first. The normal filter loading happens near the first <init-params> nodes (There will be two more later) as:

[11/20/13 17:47:58.241]:eDir Sync ST:Loaded filter.
[11/20/13 17:47:58.242]:eDir Sync ST:
<filter>
<filter-class class-name="User" publisher="sync" publisher-create-homedir="true" publisher-track-template-member="true" subscriber="sync">
<filter-attr attr-name="DirXML-EntitlementRef" merge-authority="edir" publisher="ignore" publisher-optimize-modify="true" subscriber="notify"/>
<filter-attr attr-name="nspmDistributionPassword" merge-authority="default" publisher="sync" publisher-optimize-modify="true" subscriber="sync"/>
<filter-attr attr-name="workforceID" publisher="sync" subscriber="sync"/>
<filter-attr attr-name="vehicleInformation" publisher="sync" subscriber="sync"/>
</filter-class>
</filter> (Snipped lots for brevity).


But this one also reads the filter back at the very end of the process, and uses it to tell ChangeLog on the remote eDirectory side what it should be listening for.

It does this as sort of the last thing before starting up fully by using a Query for the DirXML-DriverFilter attribute value:


[11/20/13 18:13:54.049]:eDir Sync PT:
<nds dtdversion="4.0">
<source>
<product instance="eDir Sync" version="4.0.1.0">Identity Manager Bi-directional Driver for eDirectory</product>
<contact>Novell, Inc.</contact>
</source>
<input>
<query class-name="Dirxml-Driver" dest-dn="\acme-IDV\acme\idm\DriverSet\eDir Sync" scope="subtree">
<search-class class-name="Dirxml-Driver"/>
<read-attr attr-name="DirXML-DriverFilter"/>
</query>
</input>
</nds>


Now we were getting this result:

[11/20/13 18:13:54.085]:eDir Sync PT:
<nds dtdversion="4.0" ndsversion="8.x">
<source>
<product edition="Standard" version="4.0.2.0">DirXML</product>
<contact>Novell, Inc.</contact>
</source>
<output>
<instance class-name="DirXML-Driver" event-id="0" qualified-src-dn="O=acme\OU=idm\CN=DriverSet\CN=eDir Sync" src-dn="\acme-IDV\acme\idm\DriverSet\eDir Sync" src-entry-id="33028"/>
<status event-id="0" level="success"></status>
</output>
</nds>


That is an empty response. Our Query asked for the read-attr of DirXML-DriverFilter, and no such attribute with a value was returned. We sent a <query> event, and got an <instance> but no attribute data in it. (Would have been a series of nodes like <attr attr-name="DirXML-DriverFilter"> nodes with <value> nodes under it).

Proper response (once all was working) is more like this truncated example, (the attribute value is very long, and of no value to actually show beyond one small bit):

[11/21/13 15:48:51.133]:eDir Driver PT:
<nds dtdversion="4.0" ndsversion="8.x">
<source>
<product edition="Standard" version="4.0.2.0">DirXML</product>
<contact>Novell, Inc.</contact>
</source>
<output>
<instance class-name="DirXML-Driver" event-id="0" qualified-src-dn="O=acme\OU=idm\CN=DriverSet\CN=eDir Driver" src-dn="\acme-IDV\acme\idm\DriverSet\eDir Driver" src-entry-id="33028">
<attr attr-name="DirXML-DriverFilter">
<value timestamp="1385054584#29" type="octet">yPSJzeW5jIiBzdWJzY3JpYmVyPSJzeW5jIi8 Cgk8L2ZpbHRlci1jbGFzcz4KPC9maWx0ZXI </value>
</attr>
</instance>
<status event-id="0" level="success"></status>
</output>
</nds>


The result is base 64 encoded, and if you decode it, you get the XML of the driver filter.

When it was not working to return the filter, we would end up with this trace snippet:

[11/20/13 18:13:54.086]:eDir Sync PT:eDir Sync:  WARNING !!! No driver filter found... Driver may not start.
[11/20/13 18:13:54.086]:eDir Sync PT:eDir Sync: EDirPublicationShim : Performing Agent Registration...
[11/20/13 18:13:54.086]:eDir Sync PT:eDir Sync: EdirPublisher - Initiating agent registration...
[11/20/13 18:13:54.090]:eDir Sync PT:eDir Sync: OpenLDAPConnection - Connect to the server
[11/20/13 18:13:54.091]:eDir Sync PT:eDir Sync: Opening SSL connection
[11/20/13 18:13:54.091]:eDir Sync PT:eDir Sync: LDAPInterface.registerDriverInstance() : Exception occurred while registration - String index out of range: -1
[11/20/13 18:13:54.092]:eDir Sync PT:PublicationShim.start() returned:
[11/20/13 18:13:54.093]:eDir Sync PT:
<nds dtdversion="4.0">
<source>
<product instance="eDir Sync" version="4.0.1.0">Identity Manager Bi-directional Driver for eDirectory</product>
<contact>Novell, Inc.</contact>
</source>
<output>
<status level="fatal">String index out of range: -1</status>
</output>
</nds>


Can you guess why no filter values were returned? Permissions. We did not give the driver object itself, permissions in eDirectory to read its own attributes. Normally this is not required, though with the self modifying policies we have now for MSysInfo and entitlementConfiguration object, the driver needs more than read permissions but also some write permissions to itself. Of course, if you make it Security Equals to admin this is a non-issue, but it is better to be more explicit in the rights you grant.

You can see this more clearly if you look closely at the trace. You will see a line like this in the logs:

[11/20/13 18:13:54.057]:eDir Sync PT:Driver object has insufficient rights to read \acme-IDV\acme\idm\DriverSet\eDir Sync#DirXML-DriverFilter.


Thus the query was able to find the object, but not read its contents.

There is a stage just after you deploy a new driver, when you get asked in Designer to Define Security Equivalences, however, we skipped it on this driver, saying, we will get to this later. I guess this is later. Fixing it is a piece of cake. You could just add a DN of a user into the drivers Security Equivalent attribute, which is how you do it in iManager, for the most part. Or you can right click in Designer on the driver line, select Live, then at the bottom of the menu is Set Up Driver Security. You can also exclude Administrative Users from this interface, which adds a DirXML-Associations with a nameSpace value of 0, on the named objects.

But turns out it was more than that. Even after we fixed the Filter issue, we still got the same error. So what else can it be?

Continuing on, we found other errors to be resolved.

[11/20/13 17:47:58.043]:eDir Sync ST:Named password 'ksore-passwd' not found in driver configuration element 'Keystore password' (keystore-pass).


Also, at the very beginning, it tries to read named passwords, and none found:
[11/20/13 17:47:57.840]:eDir Sync :Reading named passwords list.
[11/20/13 17:47:57.841]:eDir Sync :Named passwords:


The Named password to open up the keystore holding the tree CA's public key is missing. Seems like it did not properly deploy. Easy enough to fix.

Once we added it, trace now shows:

[11/21/13 12:24:10.408]:eDir Driver :Reading named passwords list.
[11/21/13 12:24:10.409]:eDir Driver :Named passwords:
[11/21/13 12:24:10.410]:eDir Driver : Name: ksore-passwd

What is interesting is that kstore-passwd is not a typo. Or at least it is what the driver is really expecting.

Looks like the packaged version we used, did not create the Named Password for us, so it was easily resolved.

Next test, see if we have a keystore format, keystore password, or a wrong key problem, so we changed config value "accept-srv-ldaps-cert" to true, I.e. Always accept the LDAP cert for SSL. Which is probably the easiest way to run, since you are controlling both ends, and it is incredibly unlikely someone will try a man in the middle attack on this connection.

When it first tries to connect to get the Application Schema, we see:

[11/21/13 12:31:19.977]:eDir Driver ST:EDirectory Driver: EDIRDriverShim.getSchema() called
[11/21/13 12:31:19.978]:eDir Driver ST:EDirectory Driver: OpenLDAPConnection - Connect to the server
[11/21/13 12:31:19.979]:eDir Driver ST:EDirectory Driver: Error occurred while getting socket factory. Reason : String index out of range: -1
[11/21/13 12:31:19.986]:eDir Driver ST:EDirectory Driver: An exception occurred retrieving schema: LDAPException: Unable to initialize socket factory (80) OtherLDAPException: Server Message: String index out of range: -1
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
[11/21/13 12:31:19.987]:eDir Driver ST:EDirectory Driver: LDAPException: Unable to initialize socket factory (80) Other
LDAPException: Server Message: String index out of range: -1
java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at com.novell.nds.dirxml.driver.edir.ShimData.getSocketFactory(ShimData.java:1168)
at com.novell.nds.dirxml.driver.edir.LDAPInterface.openLDAPConnection(LDAPInterface.java:145)
at com.novell.nds.dirxml.driver.edir.EDIRDriverShim.getSchema(EDIRDriverShim.java:180)
at com.novell.nds.dirxml.engine.Driver.getShimSchema(Driver.java:1437)
at com.novell.nds.dirxml.engine.Driver.startShim(Driver.java:1294)
at com.novell.nds.dirxml.engine.Driver.initialize(Driver.java:264)
at com.novell.nds.dirxml.engine.Driver.<init>(Driver.java:236)
at com.novell.nds.dirxml.engine.DriverEntry.run(DriverEntry.java:551)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1
at java.lang.String.substring(String.java:2286)
at com.novell.nds.dirxml.driver.edir.ShimData.getServerName(ShimData.java:756)
at com.novell.nds.dirxml.driver.edir.ShimData.getSocketFactory(ShimData.java:1155)
... 8 more

[11/21/13 12:31:19.989]:eDir Driver ST:DriverShim.getSchema() returned:
[11/21/13 12:31:19.990]:eDir Driver ST:null document
[11/21/13 12:31:19.990]:eDir Driver ST:
DirXML Log Event -------------------
Driver: \acme-IDV\acme\idm\DriverSet\eDir Driver
Status: Warning
Message: Code(-8001) Unable to retrieve application schema.


So it not specifically related to the Keystore file or configuration (since we turned off the checking on that). Still a basic connectivity issue, before we even get to SSL operations.

Interestingly, we see in trace this line:
[11/21/13 12:31:22.602]:eDir Driver ST:eDir Driver: Cleaning up auto keystore : eDir2eDir-6D9D93B0-96A6-46a0-7C87-B0939D6DA696.keystore


The GUID is the driver GUID, and it seems to use the same file. It looks like it does not delete it, rather it removes the certificate from it.

So the driver makes its own keystore, dynamically, just accepting the CA's public key (which is the design) and then cleans up when done. That is pretty clever.

Finally we realized the error was that the Authentication Context was set to idv1.acme.com instead of idv1.acme.com:636 to specify a port. It is substringing the line, and getting nothing for the port, and throwing an error when it substrings.

For fun, since I am collecting errors from this driver, lets set it to 389, but configure the driver to use SSL and we see:

[11/21/13 15:29:00.986]:eDir Driver ST:eDir Driver: LDAP Search
base=
scope=0
filter=(objectclass=*)
attrs=[subschemasubentry, 2.5.18.10]
attrsOnly=false
[11/21/13 15:29:00.986]:eDir Driver ST:eDir Driver: OpenLDAPConnection - Connect to the server
[11/21/13 15:29:00.986]:eDir Driver ST:eDir Driver: Opening clear text connection
[11/21/13 15:29:01.000]:eDir Driver ST:eDir Driver: Schema.Schema() - LDAPException: Connection lost waiting for results from acme-edir.root.acme.com:636 (91) Connect Error
java.io.EOFException: BERDecoder: decode: EOF in Identifier


Once the proper port of 636 is set, when GetSchema() is called, we see:

[11/21/13 15:31:31.186]:eDir Driver ST:EDirectory Driver: EDIRDriverShim.getSchema() called
[11/21/13 15:31:31.186]:eDir Driver ST:EDirectory Driver: OpenLDAPConnection - Connect to the server
[11/21/13 15:31:31.189]:eDir Driver ST:EDirectory Driver: Opening SSL connection
[11/21/13 15:31:31.750]:eDir Driver ST:EDirectory Driver: Host name: acme-edir.root.acme.com
[11/21/13 15:31:31.750]:eDir Driver ST:EDirectory Driver: Port: 636
[11/21/13 15:31:31.751]:eDir Driver ST:EDirectory Driver: DN: cn=idm,ou=it,o=acme
[11/21/13 15:31:31.751]:eDir Driver ST:EDirectory Driver: Protocol version=3
[11/21/13 15:31:31.751]:eDir Driver ST:EDirectory Driver: SDK version=4.6
[11/21/13 15:31:31.751]:eDir Driver ST:EDirectory Driver: LDAP Search
base=
scope=0
filter=(objectclass=*)
attrs=[subschemasubentry, 2.5.18.10]
attrsOnly=false
[11/21/13 15:31:31.761]:eDir Driver ST:EDirectory Driver: LDAP Search
base=cn=schema
scope=0
filter=(objectclass=*)
attrs=[objectclasses, 2.5.21.6, attributetypes, 2.5.21.5]
attrsOnly=false
[11/21/13 15:31:32.303]:eDir Driver ST:EDirectory Driver: Schema.Schema() Successfully read schema from cn=schema
[11/21/13 15:31:34.409]:eDir Driver ST:EDirectory Driver: No exception occurred retrieving schema.
[11/21/13 15:31:34.409]:eDir Driver ST:DriverShim.getSchema() returned:
[11/21/13 15:31:34.589]:eDir Driver ST:
<nds dtdversion="1.0" ndsversion="8.5">
<output>
<schema-def dn-format="ldap">
<class-def asn1id="2.5.6.0" class-name="Top" container="true">
<attr-def asn1id="2.5.4.0" attr-name="objectClass" required="true"/>
<attr-def asn1id="2.16.840.1.113719.1.1.4.1.12" attr-name="cAPublicKey" multi-valued="false" read-only="true" type="octet"/>
</class-def>
</schema-def>
</output>
</nds>


(Snipped for brevity, schema is big. Surprisingly so!).

Note, that when the connection is opened, while we were getting failures we never saw these lines, which is very useful thing to know, but is impossible to know, if you have never yet succeeded.

[11/21/13 15:31:31.189]:eDir Driver ST:EDirectory Driver: Opening SSL connection
[11/21/13 15:31:31.750]:eDir Driver ST:EDirectory Driver: Host name: acme-edir.root.acme.com
[11/21/13 15:31:31.750]:eDir Driver ST:EDirectory Driver: Port: 636


Thus we see it never properly got the port before, and now we can see what it should look like when it is working properly.

One other way to try and troubleshoot this, is to watch ndstrace with LDAP on the Server with changelog installed we see (using ndstrace on the console or iMonitor's Dstrace facility):


Search request:
base: "cn=schema"
scope:0 dereference:0 sizelimit:0 timelimit:0 attrsonly:0
filter: "(objectclass=*)"
attribute: "objectclasses"
attribute: "2.5.21.6"
attribute: "attributetypes"
attribute: "2.5.21.5"
Ignored duplicate LDAP attr "appPath" for NDS attr "App:Path" as it conflicts with existing LDAP attr "appPath" for NDS attr "App Path"
Skipping sending class definition "App:Application" because unable to find mapping for mandatory attribute "App:Path"
Unsupported or duplicate attribute: "2.5.21.6"
Unsupported or duplicate attribute: "2.5.21.5"
Sending search result entry "cn=schema" to connection 0x1848e000
Sending operation result 0:"":"" to connection 0x1848e000
DoSearch on connection 0x1848e000
Search request:
base: ""
scope:0 dereference:0 sizelimit:0 timelimit:0 attrsonly:0
filter: "(objectclass=*)"
attribute: "namingContexts"
attribute: "subschemasubentry"
attribute: "supportedExtension"
Sending search result entry "" to connection 0x1848e000
Sending operation result 0:"":"" to connection 0x1848e000
New TLS connection 0x37e6a80 from 132.238.5.201:43315, monitor = 0xffffffffbf040700, index = 12
Monitor 0xffffffffbf040700 initiating TLS handshake on connection 0x37e6a80
DoTLSHandshake on connection 0x37e6a80
BIO ctrl called with unknown cmd 7
Completed TLS handshake on connection 0x37e6a80
DoBind on connection 0x37e6a80
Bind name:cn=idm,ou=it,o=acme, version:3, authentication:simple
Sending operation result 0:"":"" to connection 0x37e6a80
DoExtended on connection 0x37e6a80
DoExtended: Extension Request OID: 2.16.840.1.113719.1.14.100.200
Unable to find extension handler 2.16.840.1.113719.1.14.100.200 in extension list
Sending operation result 2:"":"Unrecognized extended operation" to connection 0x37e6a80
DoExtended on connection 0x37e6a80
DoExtended: Extension Request OID: 2.16.840.1.113719.1.14.100.200
Unable to find extension handler 2.16.840.1.113719.1.14.100.200 in extension list
Sending operation result 2:"":"Unrecognized extended operation" to connection 0x37e6a80
DoUnbind on connection 0x1848e000
Connection 0x1848e000 closed

Lets parse that trace a bit:

First we see the IDM account login.

DoBind on connection 0x37e6a80
Bind name:cn=idm,ou=it,o=acme, version:3, authentication:simple
Sending operation result 0:"":"" to connection 0x37e6a80


Result of 0 means success.

DoExtended on connection 0x37e6a80
DoExtended: Extension Request OID: 2.16.840.1.113719.1.14.100.200


Now it does an LDAP Extended bind, using a specific LDAP extension. If you look at the RootDSE object on any LDAP server, it should return the supportedExtension attribute values. Here it is trying to connect to the changelog facility, that is installed on the remote eDirectory server.

Unable to find extension handler 2.16.840.1.113719.1.14.100.200 in extension list
Sending operation result 2:"":"Unrecognized extended operation" to connection 0x37e6a80


But it is failing, since the changelog did not properly register with eDirectory (and nldap).

In other words it requests an extended operation, returns, nope, Unrecognized. Changelog did not load.

We did an ndsd restart, it looks like we forgot to restart ndsd after installing changelog and then things got better. I foolishly did not snag the trace of a successful connection, to show what it SHOULD look like. (If anyone would like to post a comment with the same snippit as above for a successful connection that would be cool, and I will update the article).

I do wish they would update the shim to offer a better error on an incorrectly entered connection string. Or perhaps assume a default port if none specified, which would have saved me a fair bit of time in troubleshooting this issue. At the very least some kind of informative error would be much nicer than the less useful error it did give. Thus the root cause of the negative 1 error is that the connect string I entered was missing the colon followed by the port. Oops. My mistake.

Amusingly we ran into another error. We were getting an error that the SSL keystore is not found. But we checked the path and it was correct. Checked the value set, and all looked good. But again, as always, read the trace, since in this case, we could not see the error in the settings. But in the trace, it was actually obvious.

The cause here was a leading space in the path to the keystore file. I guess a path starting with a space does not work on Unix or Linux. I wonder if this would cause an issue on a Windows based remote loader. When you look at the trace of the <init-params> you can see the leading space much more easily than you can see it in Designers view.

<keystore display-name="Keystore path for SSL certificate(s)"> /usr/local/adm/etc/idm-certs.jks</keystore>


For further debugging help, you can enable Changelog debugging on the remote eDirectory side. Up till now, we looked almost exclusively on the engine side trace for errors. Then we looked at LDAP trace on the remote eDirectory server to see that the changelog was not loading (restart of ndsd fixed that). But we have not looked at the actual changelog debugging on the remote eDirectory side. To enable this logging you need to add the DXML and LDAP switches on the server running the Changelog module.

The Knowledge Base article on tracing the Changelog module is here:
https://www.novell.com/support/kb/doc.php?id=7014036

Labels:

How To-Best Practice
Comment List
Related
Recommended