Recently I was confronted with an eDirectory server that was acting strangely. As sometimes happens when working in consulting, I do not have any personal history or notes to reference, I do not know who set it up, what they chose to do, why they chose to do it, or what has happened to it since. I only know what I can see.
I started with the server with eDirectory 9.04 on it. My intention was to upgrade it to eDirectory 9.1, a simple task I expected would take an hour or so.
After the usual safety precautions before an upgrade, I started off with the installation wrapper (nds-install). One of the features they added to this a few years ago was the eDirectory status and health check (ndscheck). While you can bypass this, it is a good idea to let it run, see what it finds, and correct anything that it complains about if possible. In an environment where I know things to be acceptable, I might skip it, but in a case like this where I am coming in blind, it is a good idea to let it run.
This is where the trouble starts. It prompts for admin credentials, I typed in the DN of the admin user and the password and ... ERROR -669: Login failed. Hm. Typing more carefully this time ... ERROR -669: Login failed. This is a two server tree, maybe the notes about the admin credentials are wrong. Try ndslogin on the other server, works fine, so now I am sure that I have the right DN and password. Try ndslogin on this server, -669 again.
Some further debugging and research turned up TID 7017232: https://www.novell.com/support/kb/doc.php?id=7017232
describing how to disable Enhanced Background Authentication (EBA) as a troubleshooting step. It has some rather dire warnings about how when EBA is enabled, it can never be disabled or removed, and how disabling it for more than a short period of time can cause it to be broken. It rather pointedly does not include any information on how to fix it once it is broken. So, I tested this, and found that with EBA disabled, ndscheck and ndslogin (and dxcmd) all work normally. With EBA enabled, they all fail. That seemed to me to prove that the strange problems I was experiencing were related to EBA.
In addition to the local NCP utilities I tested all failing, I found that LDAP worked fine, so dxcmd in LDAP mode worked, and dxcmd in NCP mode failed. I also noted that for a period after a restart, "ndsrepair -E" would show "-669" errors in replica synchronization status. Every server in an eDirectory tree is an NCP client for other servers, and EBA is both a client and a server enhancement.
So, with EBA broken, how do we fix it? How do we even debug it? How is it supposed to work?
There is some basic and interesting documentation in the eDirectory documentation: https://www.netiq.com/documentation/edirectory-9/edir_admin/data/b1g4eoc0.html
There is some good information in there, but it is also woefully lacking in information about how to fix it when (not if) it breaks. Here I have a broken server, so it can demonstrably break. Now what?
When I started, this two server tree had eDirectory 184.108.40.206 (Server1) and 9.04 (Server2). Server2 is the broken one with EBA configured. Server1, being eDirectory 8, does not have any support for EBA. I thought that maybe newer code would be better than older code, so I upgraded Server1 to eDirectory 9.1, and then (with EBA disabled) upgraded Server2 to eDirectory 9.1 also. Testing with "ndslogin" showed that this had not helped.
My next attempt to fix this situation was to enable EBA on Server1. I ran the "ndsconfig upgrade -configure-eba-now yes" command, and set up EBA on Server1. Testing with "ndslogin" showed that this had not helped either, but now I at least had a working EBA enabled server to reference to see how it is supposed to work.
Looking back, there are some things I should have checked before I enabled EBA on Server1. They are not documented, so there was no way to know what I should have looked at, but it would have been interesting.
So now that I had a working server and a broken server, I did some spelunking in to how EBA works. What does "EBA configured" actually mean?
I found that when EBA is configured, the T= object is updated with two attributes.
EBATreeConfiguration This seems to be the data that you see in iManager on the General tab. It is the CA configuration data. It gets stored in a stream file (Server1: 1AE6.nds) which replicates normally to other servers. It seems to be in a binary certificate format, but I have not yet found a way to dump it out with openssl.
EBAPartitionConfiguration I don't know what this one is yet, but it has data in a stream file (Server1: 1AE7.nds), which also seems to be a binary certificate format, but openssl won't touch it either, so far. This one also replicates normally.
You can see these attributes in iMonitor. You can see the (hex) contents of them there as well. That turns out to be helpful information. Since these binary attributes are undocumented, I do not know what you will see there if you look. In mine, they both had the same first 20
bytes starting with (30 80 02 01 01 06 0b 60).
I found the stream files for these attributes by hunting in the instance DIB directory with:
for i in ./*; do echo $1; hexdump -C -v $i | head -1 | grep "00000000 30 80 02 01 01 06 0b 60" ; done;
-rw------- 1 root root 1894 Aug 2 17:17 1AE7.nds <--- T=xxxxx EBAPartitionConfiguration attribute
-rw------- 1 root root 1811 Aug 2 17:17 1AE5.nds <--- X.509 Certificate for ncp://192.168.1.11:524
-rw------- 1 root root 245 Aug 14 16:10 1AE6.nds <--- T=xxxxx EBATreeConfiguration attribute
-rw------- 1 root root 1809 Jan 12 2018 34F.nds <--- b0rked X.509 Certificate
-rw------- 1 root root 1894 Aug 2 17:17 9D4.nds <--- T=xxxxx EBAPartitionConfiguration attribute
-rw------- 1 root root 245 Aug 14 16:10 9D3.nds <--- T=xxxxx EBATreeConfiguration attribute
You can do something similar in your DIB directory, once you see what the first few bytes of the data look like using iMonitor. It would be interesting if they are the same as what I see here. It would also be interesting if they are different. If I had to guess, I would guess that they will be different. "hexdump -C -v" on one of them will show you the entire stream file, so you can see that this is the same as what iMonitor displays. Also, one file is much larger than the other. That may be significant to figuring out what the binary data is.
In doing this, I found a third stream file (Server1: 1AE5.nds) that has similar binary certificate data, but which does not replicate. This shows that there is server specific information stored for EBA, which is that each EBA server gets a certificate, and we can see those, as issued by the EBA CA. Where do server specific attributes get stored? On the [Pseudo Server] object, of course.
You can look at the [Pseudo Server] in iMonitor. Go to the Agent Configuration page, then scroll down to the bottom. There is a link there to the [Pseudo Server] object.
Looking at the [Pseudo Server] in iMonitor, I found that the [Pseudo Server] object is updated with a ServerEBAEnabled attribute (syntax Boolean, set to "true"). I suspect that this is really where the "EBA enabled" is decided by the eDirectory process when it starts, so, at least in theory, removing it would be a way to un-enable EBA on a server. Of course, we normal people cannot edit the [Pseudo Server] object, that requires special access only available to NetIQ Support in the form of DSDUMP.
I am guessing that the [Pseudo Server] object is also updated with a link to the non-replicating stream file with certificate data, but iMonitor will not show this. I know that iMonitor hides some "sensitive" attributes, so I suspect that is why I cannot see it. iMonitor will not, for example, display the RSA Private Key attribute, even though we know it is there.
It is useful and interesting at this point to see what they intended you to see for EBA. Using iManager, and configuring it to support EBA
https://www.netiq.com/documentation/edirectory-9/edir_admin/data/b1gk96gk.html you can use the EBA Task to show the Tree configuration details for the EBA Certificate Authority, and the Certificates that the Certificate Authority has issued to the servers. You can even see the certificate details, which is nice. It would be nice to figure out what format these are stored in in those stream files so that they could be dumped out with openssl.
In poking around in iManager, I found that the server certificates issued by the EBA CA show the Subject Name as UID=… which can be traced back to the GUID of the [Pseudo Server] object for the NCP Server they were issued to. Because there is clearly an attribute being used here, that is why I suspect it is on the [Pseudo Server], but iMonitor is hiding it from me.
Spelunking through the DIB directory on Server2, I found that Server2 has a non-replicating stream file from Jan 2018 that seems to be the certificate data issued for EBA that does not link back to a valid EBA CA, which is most likely what this server is actually complaining about.
As I said at the top, I do not know how this server got to the state it is in. I am assuming that somebody configured EBA when doing the eDirectory 9.04 install (it does, after all, prompt you to ask if you want to configure EBA now). That should have created the EBA CA, and issued a certificate to this server. Knowing what I know now, before I configured Server1 for EBA, I should have looked at the T= object in iMonitor to see if it had the EBA configuration attributes on it. I do not think it did, because the configuration on Server1 should not have proceeded to create them if they were already there. So, EBA was configured on Server2, and somehow the EBA CA went missing. That breaks EBA, as the certificate used no longer links back to anything.
It should be possible, at this point, to tell Server2 to abandon the certificate it has, and go request a new one from the Certificate Authority. It does not seem to be possible to do this. There is, of course, no documented way to invalidate a broken certificate and tell the EBA client (server) to go get a new one.
I tried a few ways to convince Server2 to abandon this broken certificate and request a new one.
The sticking point here is that EBA interferes with authentication. Authentication is required to configure EBA. If I disable EBA for authentication, then EBA cannot be configured. If I leave EBA enabled for authentication, then authentication fails, and EBA cannot be configured. It is a wonderfully secure chicken vs. egg problem.
As far as I know, there is no way to edit the [Pseudo Server] object outside of dsdump, so a Service Request was opened, and after repeating much of the above, and a few other investigatory things, we eventually were able to remove the EBA configuration from Server2 with dsdump. Afterward, using "ndsconfig upgrade configure-eba-now yes" worked on Server2, and now EBA is working correctly.
Server is EBA enabled
INFO: EBACA = false
INFO: NCPCA validity start = Thu Sep 13 09:53:30 2018
INFO: NCPCA validity end = Sun Jul 30 17:17:10 2028
INFO: CRL validity start = Tue Sep 25 12:58:50 2018
INFO: CRL validity end = Tue Oct 2 12:58:50 2018
INFO: EBACA certificate validity start = Thu Aug 2 17:17:10 2018
INFO: EBACA certificate validity end = Sun Jul 30 17:17:10 2028
The other way out of this might be to delete and recreate the instance. Since this is a two server tree, I tried adding a new instance, and that failed with the replica add stuck at "new" for the [Root] partition. I suspect that this is EBA preventing a partition operation, though I cannot see any way to prove that. The documentation says that EBA will prevent partition operations that would attempt to remove EBA. I may have to treat Server2 as a dead server first, nuke the DIB directory, clean up the replica rings, then recreate it. I did not try that.