The purpose of this document is to give a real-world example of how we installed OES2 SP2 64-bit onto our 16-node NetWare cluster environment to perform a rolling cluster upgrade. Hopefully this guide will cover most of the common things to watch out for, as well as a good example to get you quickly up and running. Servernames and IP addresses have been changed, so your environment may vary. If you are performing a rolling cluster upgrade from NetWare, we assume you are already using NSS and have a functioning cluster with an SBD partition.
The hardware in our setup are 16 HP BL460c (original release) blades in a c-class Chassis. These blades are connected via pass-through modules QMH2462 Qlogic mezzanine cards to a Xiotech Magnitude 3000 SAN. As such, we will be using multipathing (this is NOT required for OES2 or for NCS).
Because our blades are diskless, we use HP ILO to boot from the CD/DVD media and install across the network from our SLES installation server. You CAN install OES2 via other methods (PXE booting or using the actual media mounted via ILO—although ILO is much slower).
Our cluster nodes do not have master replicas on them. We have dedicated NON-Clustered replica servers. As such, it did not matter which nodes we migrated to Linux for eDirectory purposes. If one of your cluster servers is the Master replica server, or the server running the Certificate Authority, you must save that server for last (per Novell's documentation).
We also do not use DSFW, and as such did not need to worry about whether to install this. DSFW can only be installed at the initial installation (you cannot ADD it on later). So if you decide you are going to use DSFW on a clustered server, you must decide BEFORE you install OES2. (Personally, I would not install DSFW on a cluster, but use a dedicated set of servers for it, but you do not have to follow this suggestion).
Clustered items we are migrating: NSS (data cluster resources), GroupWise (POA, MTA, WebAccess, GWIA), iPrint, Apache Web Server.
Lastly, this is a ROLLING Cluster upgrade. Because we will be running the entire cluster in a mixed-mode of NetWare and Linux nodes, it does provide some benefits, but there are also caveats. We assume you also have a NetWare cluster using NSS and as such are already aware of the 2 TB NSS limit pertaining to devices (disks).
Make a list of all your server software and services you are running on NetWare and which nodes said items are allowed to run on. Check with the vendors to make sure that any third party software will work on OES2 SP2 Linux. You may need to upgrade or purchase different software (for example, Avanti Technologies TaskMaster does not currently run on OES2 Linux—so we had to workaround this). Develop a schedule, and start slowly until you are comfortable. A Rolling cluster upgrade/conversion is designed to be a temporary situation. Ours was mixed for almost a month.
Ie: install=nfs://slesadmin.abc.com/install/SLES10SP3_64 hostip=10.10.1.10 netmask=255.255.0.0 gateway=10.10.1.1 nameserver=10.10.1.250
Select Yes, and click Next
Check the "include add-on products" and click Next
Then click Next.
Click Yes, and then Next.
Make sure your time settings are correct for your environment and click Next. Later we'll configure for NTP time.
Click Partitioning (we need to change some stuff).
Click "Create Custom Partition Setup" and then click Next. This is just an example. Feel free to follow your own server setup guidelines. I would strongly advise AGAINST using EVMS for your boot LUN setup. Use either custom partition or LVM setup.
Why do we do this? We don't like to setup one big LUN (virtual disk, logical drive, whatever your RAID hardware calls it) for / (root partition) using Reiserfs.
With OES2 Linux, you ALWAYS (in my opinion) want to setup a dedicated LUN for your "boot" code, and leave a separate LUN for NSS (if using NSS). NEVER allocate all your disk space to one LUN. Think of this as NetWare, in the sense that you had your DOS partition separate from NetWare partitions, and SYS volume separate from your other volumes.
Select Custom Partitioning and then click Next.
We have one LUN here. This LUN is our "boot" LUN (the 15.0 GB LUN)
Select Primary Partition and click OK
Make sure to set the file system to Ext3 and the size to 1.0 GB and the mount to /boot
Click OK. Your boot disk can be whatever size you decide on. 1.0 GB may be a bit large for some folks. The main point is to make it a dedicated partition on the LUN-0/boot LUN.
Choose Primary Partition and click OK
Change "file system" to Swap.
Set to 2GB and click OK (don't forget the mount point of swap)
Again, set accordingly (the old rule was 2x your system RAM, but our servers have 4.0 GB of RAM and if we ever actually use SWAP, we probably have something going on that we need to look at).
Click Create and Primary Partition again.
Change file system to Ext3 and let it use the rest of the LUN and mount point is /
Some folks may wish to partition out the /var or even /opt partitions. Again, this is just our layout and the main point is to segregate your /boot, swap and / partitions at the very least.
Always uncheck the Novell AppArmor (unless you really wish to use it). For OES it will depend upon what type of install you are doing. This is for the setup of a Blade for the Cluster. We will select the NCS software later. However, ALL OES2 servers should have the following items selected (for our environment):
We choose to install NSS even if we aren't going to use it right away (again, never know when you may want/need it). I find the NCP server handy so that you can use native Linux EXT3 partitions and attach to them with Windows PC's via the Novell Client (as opposed to having to muck around with SAMBA configurations). This also adds NCP file locking if using GroupWise and the ConsoleOne Windows Management snapins.
I also install the C Compiler tools because you never know when you may need them.
(Most notably on Vmware, or if using the HP Proliant Support Pack because it installs non-kernel drivers sometimes and therefore you need the Compiler to recompile the kernel for non-stock drivers).
Click Accept again.
Click Accept again.
Wait for it to create the partitions
It should reboot and launch the rest of the install
Enter in the password. This should be diff. than the eDirectory Admin password. Click Next.
Uncheck the "change hostname via DHCP". We don't give out DHCP in our server room. Follow your server naming conventions. We use:
Where XX = the node number (01, 02, etc.)
Also, since we are REPLACING a NetWare server cluster node, we chose to keep the same server name. As such, we entered the same name for this OES2 node as the NetWare server we replace it with.
Set firewall to disabled (for now).
Also Disable IPv6. I've had issues with it in the past.
Click Network Interfaces
On the Blade servers, the first HP NIC is the "primary" one. You should double-check by looking at ILO for the MAC address and comparing to what SLES shows above to make sure you're working on the correct network adapter.
Sometimes Linux assigns the NIC in reverse order (ie, 2nd NIC will be eth0, 1st NIC will be eth1). Make sure to find the MAC address of the NIC and compare against what Linux finds (click Edit and you can go to the Advanced section and verify the hardware address). Otherwise you may THINK that first NIC listed is the primary NIC (eth0) and it's not. Then your install fails later because of this. Alternatively you can disable the secondary NIC in the BIOS and re-enable it later.
Set the IP and Netmask. Use the SAME IP and Netmask as the NetWare node that you are replacing this server with.
Click Hostname and Name Server
For the Cluster nodes, make sure you use the same name and IP as the previous NetWare node. (ie: co-nc1-svr13 on OES2 will have the same name and IP of what it was when it was on NetWare).
Enter the appropriate DNS servers and click OK (double-check that hostname and domain are still correct).
Click the Routing button
Enter the default gateway and click OK (obviously the gateway can differ depending on where the server is installed).
I believe it puts the "configured" NIC on the top now, even though we hopefully configured the second one. Click Next.
Select the VNC Remote Administration so that it is enabled. We choose to use this so that we can use the NRM (Novell Remote Manager) VNC Consoles option. ILO will work as well, albeit slower (and the mouse cursor has issues until you install the HP drivers). (Note, there are other ways to enable VNC as well).
We may change the Proxy section later.
I usually skip the test due to the fact that our firewall policies prevent our servers from accessing the internet.
DO NOT use LDAP with OES. OES uses its own LDAP server (eDirectory). You CANNOT use OpenLDAP and eDir at the same time.
For this, we'd install into the existing tree. Insert the proper tree name.
I also uncheck the Require TLS for Simple Binds. It tends to cause issues if you don't uncheck it.
We use the IP of our DS Master Replica server.
Enter the admin userid in LDAP format and the password.
Be careful here. The Server context will default to the same spot that your admin user is at. You should really use the same eDir context here that matches where the old NetWare server was at. Enter the server context in LDAP format (there's no browse button, so you have to know where the server will be installed to). I leave everything else the same.
Enter the information here that pertains for your environment. You probably will be using SLP with Directory Agents.
I leave these as-is. Click Next.
Then click Next
Now wait a long time for this and iManager to install.
For now we leave this local. Basically this means that any accounts created on this Linux server are ONLY stored on this server (same for passwords). We don't plan on creating other "local only" accounts.
Click Next (we don't define any other local accounts)
Technically at this point, you are finished with the install. However, it is STRONGLY advised that you patch the server before:
Once the server is up and running, before creating any NSS partitions or enabling Multi-pathing, we need to apply updates. We have setup an SMT (Subscription Management Tool) server on Linux (SMT is a patch "proxy" server that downloads all the patches from the Novell Customer Center (NCC) so that we don't have to configure every server to download these patches from the internet. Instead, we point the servers to the SMT server. Think of it as a "lite" version of Patchlink for Linux/OES2.
Use WinSCP (or whatever method you are comfortable with) and transfer the clientSetup4SMT.sh to the root's home directory of the SLES/OES2 server. (You can actually put it anywhere on the server, but the point is you need to run the script if you are using SMT)
Login to the server (either via SSH, VNC, or ILO) as root. Then open a terminal window.
chmod x clientSetup4SMT.sh
./clientSetup4SMT.sh –host slesadmin.abc.com
That's a " - -" (dash dash without a space) in front of the host line
Hit Enter and wait
The icon will normally be orange at this point.
It will usually come up and tell you a few patches to update. Update the items listed.
The default list will contain security patches first, followed by "mandatory/recommended" patches to SLES10 and OES2.
I usually apply those (reboot needed I believe)
After that, you'll usually get a globe icon if everything (including optional patches) is installed.
After patching, you may have problems with iPrint Plugins in iManager on OES2 SP2. Check out TID #7005152. You'll have to change the property pages for two objects.
Settings for NSS and Symantec NetBackup:
In order for NBU to work properly with NSS we need to do the following:
You need to edit the /etc/opt/novell/nss/nssstart.cfg file and add the two following statements/lines:
That's an "I" in the CtimeIs statement, not an "l" ( the documentation on Symantec's site is difficult to read).
I restart the server after this to ensure that it's loaded.
HP Proliant Support Pack Install
Install the Proliant Support Pack and reboot and verify that other items are mounted and work properly.
First ensure that you have fully patched the system.
Then power down the system and connect the secondary boot LUN (assign VDISK in the Magnitude Icon Manager).
Then power the system back up.
I usually run the Partitioner to make sure it sees the LUN twice.
This section is how to enable multi-pathing (MPIO) when booting from the SAN. As you can see, we have two paths.
Now, we follow TID 3594167 (which states we need a fully patched system, so that's why I patch first).
So far we've done steps 1-4, now we do step 5.
You may need to implement this!!!!!!
Edit the /etc/modprobe.conf.local file to ADD the line as shown below:
Why do we do this? In our cluster setup we have non-contiguous LUN numbers from 0 all the way to 64. In order for the Linux OS to see all the LUNs properly, we needed to add the above item. If you have contiguous LUNs, you may not need these settings. It will increase boot/load time by a few seconds (about 3-5 seconds by my timing in my environment).
Open a terminal and type:
(as per step 5)
Removed Step 7 as SLES 10 SP3 changes the output of the command.
Edit multipathd.conf file as per Xiotech (adjust per your SAN Vendor):
At the prompt type:
Enter the information as shown:
(note the spaces).
We may change round-robin, but I'm not sure yet.
STEP 8 from the Novell TID:
Reboot the server
Open a terminal prompt and type:
Here's a key for the output of the multipath command:
We need to change a few more items.
Login to NRM (Novell Remote Manager) on the temporary OES2 server, using the following format:
https://dnsnameofserver.abc.com:8009 (same as it is for NetWare)
You must login as admin.dec or the "root" user. You cannot login as yourself just yet.
Click "Manage NCP Services" -> Manage NCP Server
Click the value of "2" next to the OPLOCK_SUPPORT_LEVEL and set to a value of 0 (that's a zero).
NRM will automatically restart the ndsd process to make the change take effect.
New Item on 11/9/10:
Per TID #7004848, we need to set the First Watchdog Packet:
Set it to 5
We also set the maximum cached subdirectories per volume to be 500,000. Why? We discovered that, unlike NetWare, on very large datasets, OES2 needs to have this setting increased in order for the Novell Client to properly see all the files/folders on some volumes. If you discover that your clients no longer see all the data after converting to OES2 Linux from NetWare, odds are, it's this setting (or the two above it) that need to be increased. You can also refer to TID #7004888 for more information.
There may be some problems with the watchdog settings at this point. Some people have reported issues with changing the setting to 5 (like it was on NetWare). We had problems by leaving it at zero. You can refer to TID #7004848
Now click the "configure" icon:
Click the "Edit httpstkd config file"
Scroll all the way down to the bottom and ADD the following two lines:
Then click Save Changes.
You now have two choices. You can either restart the entire server, or restart the following process to make the Email changes take effect:
rcnovell-httpstkd restart (this may also take a minute or so) – this makes the email change take effect
I like to reboot the server once more at this point and make sure life is good.
Before we begin, you may wish to check a few things.
First, we got bit by the infamous Panning ID situation. Basically if your NetWare Cluster is at NW 6.5.8 and is working okay, the odds are you do not have this problem. However, if you are at an earlier release, I strongly advise that you apply NW 6.5 SP8 to one node first and reboot it. If the node refuses to join the cluster, then odds are you have the Panning ID problem. (the gibc.nlm in SP8 was changed from the previous version and this is where you can tell if you have a problem or not). See the following TID for more information: 7001434 (step 7 has the Panning ID situation covered).
The BEST method to fix this problem, unfortunately, requires that you shutdown ALL your Cluster nodes and then start them again. If you have a problem getting your first OES2 SP2 Linux node into your cluster, you may have a Panning ID problem and will have to shutdown all the cluster nodes and start them again.
Before I install NCS into the first node, I put a read/write replica of the eDirectory partition that contained the Cluster objects onto the server. I don't believe this is required, but it can help with cluster sync problems.
At this point, I use my SAN utilities to connect the existing disk that hosts my SBD partition to this new OES2 server. You can either reboot the server (probably easiest) or initiate a scsi-bus-rescan.sh and verify via: multipath –ll that your server sees the SBD partition.
Login to the physical node as root (VNC or ILO).
Click Computer -> Yast -> Open Enterprise Server -> OES Install and Configuration:
Check the box next to Novell Cluster Services so that it has a black checkbox.
Wait a few minutes for the files to install (about 2 minutes)
Then a few post-install screens will run (MiCASA, etc.)
At the OES Configuration screen, (wait for it to build), click the "disabled" link underneath the LDAP Configuration for OES. I am not 100% sure that this step is even necessary to be honest, but the Novell docs state to do it anyway.
Generally speaking: There will usually be two IPs here. One is the IP of the actual cluster node you are working on, and the other is probably the IP of a server with replicas on it. In our setup, we have three dedicated LDAP servers that contain replicas of our entire tree. As such, I adjusted the lines here so that I had FOUR IP addresses listed.
The local IP
And the 3 other "remote" IP of our other LDAP servers with replicas on them.
Your environment may vary.
Now click the LDAP Configuration for Open Enterprise Services link.
Enter the "admin" password.
Click ADD if you wish to add additional LDAP servers.
Scroll down and click the "disabled" link underneath Novell Cluster Services (NCS)
Now click the Novell Cluster Services (NCS) Link.
10.10.1.230 is the IP of our LDAP server that contains ALL replicas. I would try to avoid using the local IP (the 10.10.1.10 above) unless it has replicas of your partitions on it.
Also, there used to be a bug for the Cluster FDN that it had to be case sensitive. In other words, if your eDirectory Cluster object was: CLUSTER1, you had to make sure that you entered it EXACTLY as it appeared in eDirectory. While this is supposedly fixed, I chose to make sure that the case matched anyway.
BE CAREFUL here and make sure you have the proper context specific in LDAP format.
Make sure that "Existing Cluster" is selected.
Make sure the IP listed is the correct IP of the physical node you are installing this on.
UNCHECK the "Start Clustering Services now" and click Finish
Click Next at the next screen.
Click Configure Later and click Next (we use SMT so that's why, plus we already patched it).
Open a terminal and type:
This should display:
Notice that the cluster object name IS in uppercase (NCS1 vs. ncs1). This verifies that the server does see the SBD disk partition. Your cluster name will probably vary.
Reboot the server.
It may take several minutes AFTER the server is rebooted and loaded for the iManager -> Clusters -> Cluster Manager to show the green dot on the server object:
Yellow dot means that's the node that is running the Master Cluster IP resource.
A server that is NOT in the cluster has no dots at all.
Note that some of my servers are UPPERCASE and some are lowercase. This is because I entered the servername in lowercase during the OES2 Linux installation. You will find that as you reboot the server with the master Node, it will "update" the server names, and eventually all your servers will show up with lowercase. (If you used lowercase that is). I have not noticed any harm with the change of case for the servername.
At this point, we are ready to begin the actual Rolling Cluster Upgrade of the services themselves. That is in part 2.