eDirectory Quirks: /etc/init.d/ndsd and the instances.0 file

Earlier this week I stumbled upon this because of some sloppy editing of an instances.0 file, which is the file on Linux/Unix which tells eDirectory of all of the possible instances owned by the 'root', aka UID 0, user. The file is held under /etc/opt/novell/eDirectory/conf/.edir/ (or under $NDSBASE/etc/opt/novell/eDirectory/conf/.edir/ for those running a non-root install under $NDSBASE) and is just a plain old text file that is meant to have one instance's nds.conf file per line. For example, if you had three instances in different directories thee file's contents may look like this:

The result is that all of the eDirectory utilities can quickly and easily find all of a given user's instances, and if you want to you can even have a single application find all users' instances since all of the instances.<UID> files are in a specific path. There are some limitations, namely that the only auto-shutdown and auto-started instance is the first one owned by the 'root' user, unless you do something like use the ndsd-multi script I created a few years ago to handle multiple instances properly: ndsd-multi : a way to start multiple eDirectory instances simultaneously in Linux

So what was the quirk? Well I foolishly had a bit of code to cleanup instances quickly and brutally and removed the lines from my root user's instances.0 file using the following line:
echo '' > /etc/opt/novell/eDirectory/conf/.edir/instances.0

Seems pretty harmless. It simply states write nothing (two single-quotes together, nothing in between them) to a file overwriting (vs. appending to with >>) it so that it should be pretty empty. Anybody see the subtle problem there? 'echo', the command, always writes a newline along with whatever you tell it to print to standard out (STDOUT) unless ou tell it not to, like this (notice the added '-n'):
echo -n '' > /etc/opt/novell/eDirectory/conf/.edir/instances.0

No big deal, the file has a single newline, a single byte, making up all of its contents. Seems harmless... you probably would not even see it if you looked and were not specifically seeking this kind of detail. Next step to cause a problem: create a new instance. What else would you do after cleaning eDirectory instances from a machine besides add a new one of course. For that use ndsmanage or ndsconfig or whatever you prefer. When completed the user's instances.0 file now may be a bit more clearly broken, but you won't see it unless you look explicitly since all of the nds* tools (ndsmanage, ndsconfig which just created the instance and modified the instances.0 file, ndstrace, etc.) work properly. You can start and stop eDirectory normally with the ndsmanage command even, or list all of the instances with any old tool and see just what you'd expect: one instance and no errors.

Now for the problem: restart the machine. If you do this you'll see two things if you're watching on the console. First, eDirectory will not stop properly, and it will tell you shutdown failed. It didn't really, but neither did it really try to stop. In fact eDirectory is sitting there like a duck in crosshairs about to be sent a TERM, then a KILL by Linux when that silly process does not stop as its init script (/etc/init.d/ndsd) told it to.

Next, when you boot back up, you'll likely find out quickly that eDirectory did not start. If you watched loading services you'll see another failure there when the init script tries to start it. Hope you didn't want to use that instance, or that you do not mind always remembering to stop manually (lest you let the Linux box kill the process that doesn't go way on its own, possibly corrupting data); also hope you do not mind starting it manually after every reboot. The problem is how the /etc/init.d/ndsd script handles the instances.0 file; specifically it grabs the first instance (only) for startup and shutdown, and does so like this:
default_config_file=`head -1 $default_conf/.edir/instances.$ownerUid 2> /dev/null`

As you can see, the default_config_file variable is being set to the result of the first line of the instances.$ownerUid file (instances.0) without doing any verification of that file to ensure the first line isn't garbage. The other utilities, ndsmanage, ndsconfig, ndstrace, etc., do not care about blank lines and instead look only for valid lines and then operate on those. The result is that if you configure the first instance on the second (or any other) line, everything will work nicely until you clean up those leading lines. The problem also exists if your first line is not blank, but instead has some spaces in it. Really anything other than a full valid perfect path to an nds.conf file will cause you grief of the crash-on-shutdown-and-fail-on-startup kind. Of course, avoid this.

How can you catch this problem? Well, first, be very careful with those touchy instances.0 files. If you rebuild it per the install docs, or if you add instances manually when restoring something, or any other time you're in there, be SURE you have the desired instance first, and without anything before it. Comments are not allowed in these instances.UID files either, so nothing starting with a ';' or '#' to try to comment what you're intending to do in there (those will cause all of the other nds* utilities to break as well, as it turns out).

To recreate the file quickly with invalid things removed, try these quick commands (assuming instances.0 is the file, since that is the main one handled by /etc/init.d/ndsd as mentioned above):
#create a backup file
cp -a /etc/opt/novell/eDirectory/conf/.edir/instances.0 /root/
#Modify the existing file and send the modified contents to overwrite the old file.
grep -v -e '^$' -e '^\s*$' -e '^#.*$' /root/instances.0 > /etc/opt/novell/eDirectory/conf/.edir/instances.0

If the two files above (in /root and in the original location) are identical, then nothing was done including no harm. If they are different, the new file should now be cleaned up.

This problem can be seen in multiple versions of eDirectory up through 8.8 SP8 Patch 1. Hopefully a fix will come for it soon since it's an easy fix (you're welcome to implement it yourself, but beware that if you do so in /etc/init.d/ndsd every single patch of eDirectory ever released until a fix is shipped by NetIQ will probably re-break things for you) but prioritization is out of my hands. The fix is to change line 118 of the current script from this:
default_config_file=`head -1 $default_conf/.edir/instances.$ownerUid 2> /dev/null`

to this:
default_config_file=`grep -v -e '^\s*$' $default_conf/.edir/instances.$ownerUid | head -1 2> /dev/null`

Nothing magical... just first uses grep to remove any lines with nothing but a newline or just spaces, and then grabs the first line of whatever remains (presumably an absolute path to an nds.conf file).

In case it helps in tracking down bugs like this in shell scripts, here's a brief primer in doing so with bash: bash -xv

That's right, all you need to know is 'bash -xv'. What does that do? Well, the '-v' prints lines that are being received by bash (commands, the things eValuated) and '-x' shows how they are being run or eXecuted. The '-x' is semi-well-known, and either option can be set in an already-running shell (like the one you open anytime you SSH or otherwise login to Linux or Unix assuming you have bash as your default interpreter) by running 'set -x' or 'set -v'. To turn off the options, change the '-' for a ' ' (remember the arithmetic stuff, and how those two are opposites) by running 'set x' and 'set v'. These two options can be added together at the top of shell scripts in the shebang line, so instead of having the first line of a shell script be simply:

have it instead enable debugging by default, particularly during the creation phase of a script (not for production use):
#!/bin/bash -xv

You'll notice incredible amounts of output the first time you do it, and while it is a little daunting it is also very rewarding to be able to see exactly what is happening, line by line. For example, here's our little problem pre-patch:
ownername=`ls -l  $0 |awk '{print $3;}'`
ls -l $0 |awk '{print $3;}'
ls -l /etc/init.d/ndsd
awk '{print $3;}'
ownerUid=`id $ownername | cut -d'=' -f2|cut -d'(' -f1 2>/dev/null`
id $ownername | cut -d'=' -f2|cut -d'(' -f1 2>/dev/null
id root
cut -d= -f2
cut '-d(' -f1
default_config_file=`head -1 $default_conf/.edir/instances.$ownerUid 2> /dev/null`
head -1 $default_conf/.edir/instances.$ownerUid 2> /dev/null
head -1 /etc/opt/novell/eDirectory/conf/.edir/instances.0

if [ "$default_config_file" = "" ]
'[' '' = '' ']'

Notice that some lines start with a and some do not. The ones that do not are the commands as they are written in the script (or whatever is inputting commands) so this should look fairly familiar. These are printed because of the '-v' option of bash. The other lines, those starting with one or more ' ' signs, are shown because of the '-x' option, and are what is actually running (variable interpreted, showing levels within loops or function calls by adding more ' ' signs, etc.). The first line above shows ownername, a variable, being set the contents of something in backticks, meaning the contents of whatever is going to be run within backticks. The next line is a line of only what was within backticks, meaning that is going to be run next. The next line starts with a ' ', our first one in this little snippet of code, and is only the first command on that single line, the stuff before the pipe (|) character. The next line with a plus is the stuff after the pipe, and in the first case you see that the parameter passed in no longer shows up as '$0' but instead is the zero-th argument to the script (the script itself). Kinda neat right? The third line with a ' ' shows the final execution when the ownername variable is finally set to the stuff returned from all of the stuff between backticks.

If you were running this like I was, you'd find out quickly that besides sending out a ton of output, processing time also increases (becomes slower) very quickly when doing debugging, which is another reason to NOT leave these options enabled in production. For troubleshooting/developing they are invaluable and I find myself just turning them on by default since once you get used to them the drawbacks are pretty small unless you're doing something really big (at least thousands of lines) or with tons of loops so that performance is really impacted.

From the output above notice that eventually we get to the following:
if [ "$default_config_file" = "" ]

This is the line where the variable default_config_file is compared with, well, nothing, and if the two are equal (meaning there is a zero-length value for the default_config_file variable, meaning nothing was found) then the next lines set the variable to a default value. That can probably be useful in some situations, but it makes no sense here, so in the end the ndsd script fails to do anything, start or stop, with the instance, which is exactly what we see on the lines from the '-x' output:
  '[' '' = '' ']'

Here we see the comparison of zero-length string to zero-length string. The first zero-length string is what was in that default_config_file variable, and if we go back a line we see that the contents of default_config_file came from the 'head' command (you know the rest from the earlier parts of the article). The second line above is where the system goes ahead and sets that default value to default_config_file for cases that do not apply in our situation, and which do not help us.



How To-Best Practice
Comment List