This is a continuation of my OES2 Rolling Cluster Upgrade from NetWare guides.
While NSS Cluster resources are fairly easy to migrate to OES2 SP2 Linux, there are a few caveats and items to be aware of, that I shall detail in this document.
Pre-Migration Tasks
Migrating a NetWare clustered data resource (ie, an NSS volume) to OES2 SP2 Linux, is fairly easy. Make sure that you have notified your users ahead of time as there may be significant inaccessibility due to the trustee file rebuilds.
Make sure that both the NetWare and OES2 nodes have paths to the disks on your SAN (You CAN failover an NSS disk from NetWare to OES2 Linux and vice versa). However, I don't exactly recommend failing items back (I'll explain why later).
On the NetWare server, that is hosting the NSS Clustered resource, run a NoRM inventory report on the ENTIRE volume. You basically want to get an idea of how many “objects” are on the volume. By objects, we mean the number of files AND directories. You will use this to estimate the “downtime” for how long it will take Linux to rebuild the trustees. See the Novell OES2 Online docs, section 16.3 Estimated Time Taken to Build the Trustee File on Linux
Verify SAN connectivity
Make sure that the OES2 Linux server can see the actual disk (vdisk/LUN, etc) and path to said item. (Again, my setup assumes multipathing is enabled). The easiest method is to open a terminal and type:
In our case, we have a Xiotech SAN, so the first number is the device-id of the vdisk. The size reports the size of the vdisk. The four numbers (0:0:3:8) are the path and LUN. The first number is the path (in the case of the Xiotech, it's the “vport”), the last number is the LUN. This is the easiest way to verify if the server can see the VDISK. In the Xiotech Icon manager each vdisk assignment will have a LUN and a size. You compare the LUN and size from the Icon Manager to what you see in the multipath –ll. If you see the disk, then you can proceed. In this example, I purposely show a server that does NOT see the vdisks.
If you do NOT see the disk, and you are sure that the Vdisk IS assigned, you can issue a rescan of the SCSI Bus:
Notice that now the server sees NEW disks. (divide by 2 because each disk is seen once on each path). However, I have had cases where this does not say that new devices are found, BUT the multipath –ll command WILL show the new disk anyway.
Sometimes you may have to issue a “forcerescan”:
rescan-scsi-bus.sh –forcerescan
That's a “dash-dash” in front of the word “forcerescan”.
Make sure that you see the pool for the resource you are going to migrate. IF the list is blank or you don't see the pool, you may have to back out and select Devices again.
Fixing Missing Pools on OES2
However, it's possible you may STILL not see the pools. In our case, some of our disks were originally created with NetWare 6.0. If so, then you need to find the NetWare node that is running the resource and type:
mm upgrade partitions
Hit ENTER.
Then type:
cluster scan for new devices
Hit ENTER again
Then, on the OES2 server type this at a terminal:
evms_activate
and THEN type:
nssmu
And access the Devices and then hit ESC and access the Pool menus. You should see the pool now.
Wait for the resource to load. Again, iManager defaults to every 30 seconds. If the resource hasn't loaded in about 2 minutes, or shows comatose, then you have a problem. Otherwise it should show: running. The FIRST time you migrate it can take longer than normal to load.
Be patient.
Sync NSS Trustees
After you online the volume, wait a few minutes and then map a drive and REFRESH the view. You are looking for a folder called: ._NETWARE (it's hidden so it should show up in a lighter font IF you have your Windows explorer set to show hidden folders).
Typically you will only see the first two files. Depending upon the SIZE of the volume and how many files/trustees there are, it can take about 20 minutes per 1 million files/directories, on our servers. Our server hardware is HP BL460c, dual core Intel Xeon 3.0 Ghz, and 4 GB of RAM.
Wait about 10-15 minutes and then F5 to refresh the view.
You SHOULD see the: .trustee_database.xml file grow from 0 kb to some number (the above screenshot is 1 kb because there are hardly any trustees for the files on that volume). I do not know of a good way to actually tell when the sync is done (ie, a 1 KB file could indicate a volume with hardly any trustees, or it could still be building). There are only two ways I know of for sure to tell when the sync is done. Either run the sync process manually (I don't recommend this because the sync should AUTOMATICALLY run and then you'll end up with the sync running at the same time and it will progress even slower), or enable full ncpcon logging and watch the ncp2nss.log file. But again, you'll have to enable full debug logging in order to see this. It would be nice if the default logging would indicate when the rebuild is done.
The log file is also in the:
/var/opt/novell/log/ncp2nss.log
IF the trustees don't sync (you'll know this the next day or after X minutes/hours) then you can manually sync by running this (do this FROM the server itself via ILO in case your Putty/VNC session times out):
Where VOLUME is the name of the volume you migrated. Note that it's the VOLUME name and not the cluster resource name. So: CS1-DATA1 is the resource, but the VOLUME is: DATA1.
The above was a VERY small volume, hence the quick time. I've had some volumes take 20 minutes and others take 2.5 hours.
Please note a few things. The automatic sync only works for the FIRST time that you migrate the NSS volume from NetWare to OES2 Linux. So if the sync doesn't complete correctly the first time, OR you migrate the resource BACK to NetWare and then migrate again to OES2 Linux, the trustees won't sync again. This is why I strongly suggest that you don't migrate an NSS resource back and forth between NetWare and OES2 Linux (technically you CAN do it, just that I don't think you SHOULD). And do not delete the ._NETWARE directory at the root of the volume (or it's contents), as this is what is looked for to determine if the sync has been run already. (ie, if you delete that, and migrate the resource to another OES2 server, it'll start the rebuild all over again).