Shared SCSI Two-Node Cluster

Using a Dell PowerVault 220S for a two-node shared cluster:

NetWare 6.5 includes a two-node license for Novell Cluster Services. NCS is a great clustering solution, but the buy in cost can be quite high.

The key issue is the need for shared storage. This was meant to be Fibre Channel at least early in the game. The problem is FC can be quite expensive, even today.

There are a couple of ways to minimize that cost, such as buying a SAN that allows you to strip down the licensing to only allow two nodes to directly attach, and perhaps does not allow the use of a switch. Then when the money comes available, try to upgrade the licensing. However this can still be more expensive than you might hope. (Dell may have some low end EMC devices they could offer like this).

Another option is to use iSCSI to connect each node to the shared storage on a third device. This is probably the cheapest solution since any NetWare or Linux box with some attached storage can provide iSCSI services. This will work especially well if you already own a proper storage device that will speak iSCSI.

If you re-purpose an existing server you inherit the existing redundancy (or lack thereof) in that server. Once you start adding the costs up, it can get relatively expensive quickly.

The purpose of this article is discuss a reasonable alternative that is less costly.

There is a course for Clustering out there that discusses how to use two SCSI cards to connect to a single SCSI driver. This is known as a Shared-SCSI cluster.

The downside to this model is that it is a single disk solution. You will definitely want some kind of redundancy in the storage solution.

There is a specific hardware solution that can do shared SCSI with RAID. It happens that I have done this with very specific hardware, which I will describe. In principle this could be done with any hardware that supports it.

Dell sells an external SCSI enclosure, the PowerVault 220S (221S is the tower model). It has 15 drive bays for 3.5" U320 SCSI hard drives.

Dell also sells an appropriate RAID controller, the PERC 4DC or 4eDC controller line (The e stands for PCI Express).

The unique thing about the PERC 4DC line is that it supports Cluster mode for Shared SCSI. This controller is based upon the LSI (formerly AMI) chipset, so it is likely there are other controllers from other vendors that support this as well.

The tricky parts are to set the configuration to work. The first that needs to do is to make sure that the RAID set's cache mode is set to WRTTHRU not WRTBACK. This is the single biggest downside to this proposed solution. You cannot let the controller Write cache. This makes sense as if a single controller is writing to the disk array, the node crashes, before the cache is flushed out to disk, you pretty much guarantee data corruption.

Higher end solutions, usually in SANs mirror the write cache between controllers to avert this issue. The downside there is you use twice as much cache since it is mirrored to each controller.

In the LSI controller parlance, WRTTHRU means do not use Write caching. WRTBACK would be the caching mode, but we cannot use that in cluster mode. RAID5 alas, really really really wants write caching. This is because it is much more efficient for the controller to cache a bunch of write events and then when it is ready, do the math to generate the parity and write it all out at once. In WRTTHRU mode the controller has to calculate parity for each operation, write it out, before moving on to the next write. This is much slower than the disks could otherwise provide. If you are deploying a cluster for a write heavy environment this is the wrong way to go for you.

From personal experience, over the network, using backup software to burst data to a cluster, we found that a PERC 3DC controller could write about 8 Gigabytes an hour. A PERC 4DC controller could write about 12 Gigabytes an hour. When we moved to an EMC CX300 SAN, with the controllers caching and mirroring their 1 GB of cache, we found we could write at 80 GB/hour sustained. (We used some bad example files, workstation images, that were many many 2GB files. But hey, where else are you going to get 400 GB of data to test with.). At that point, the network is the bottleneck, and a single GigE Ethernet card is not enough to go beyond 80GB/hour.

Setting the cache mode is done through the SCSI BIOS for the controller. Most of the changes needed, can be done together in one reboot, so you might as well do it from the BIOS screen. However, you can do some of it from the DELLMGR.NLM interface (Which looks identical to the BIOS interface, but as an NLM running at the Netware console) if you need to prepare in advance for an outage.

Ctrl-M at boot time gets you into the BIOS screen. Use the Objects, then the Logical Drives, (if you have one configured already). Select the Logical Drive (this Dell /LSI speak for the configured RAID set), and from the Settings menu change the caching mode from WRTBACK to WRTTHRU.

Next we need to set the card into Cluster mode.

Back at the Objects menu, select the Adapter. One of the options is Cluster mode, choose to Enable it.

Finally within the controller, the last thing we need to do is make sure we have unique SCSI ID's within the storage enclosure. There are two controllers, and by default each of them starts as SCSI ID 7. So one of them needs to be switched to ID 6. Whichever one you decide, be sure to mark it clearly as after the fact, in the middle of troubleshooting a problem it is very hard to tell for certain. Mark it on the case or in your documentation.

This will require a reboot to take affect so do this step last.

Now that the controllers are ready to go, we need to make sure the PowerVault 220S is running in Cluster mode. Power down the attached node, and then turn off the PowerVault. In the very middle of the backplane is the overall controller for the external enclosure. This has a switch that controls the backplane. There three options are the top, for a single backplane, the middle for a split backplane (7 and 7 drives each on a separate SCSI chain) or the bottom selection, cluster mode. Where each controller (EMM, Enclosure Management Module in Dell speak).

One further downside to this solution, is that by enabling cluster mode, the 14th drive falls off the edge of the world. That is, SCSI supports 16 ID's on a single chain. (Technically with LUNs we should be able to use many more, but for the moment 16 is the maximum useful). The enclosure supports 14 drives, but the backplane needs a SCSI ID (15) on the chain, that is now 15 IDs potentially used. Now you have enabled cluster mode two SCSI controllers, each with its own ID (6 and 7) which would be 17 IDs which is not allowed. Something must go, Dell chose to drop the last drive, limiting it to 13 usable drives. With 300GB drives available, that is a fair amount of space.

The PERC 4DC has two external channels, and you can add a second enclosure on the second channel on each controller to add another 13 disks to the solution.

Once this is done power everything back up. If you go into the BIOS now via Ctrl-M at boot time you will get a scary warning not to do this if both nodes are active. In general this is true, but basically it is saying that you should be careful and not make changes from both ends at once. Generally good advice.

Set up your disk array as you want it.

Now for the easy part. Install NetWare 6.5 on each node. Get them ready to go, all configured and happy. Then using the install media run NWDeploy and install clustering on the external array.

The last tricky bit is to make an NSS volume on it (that is the easy part) and using NSSMU, from the Devices menu you need to hit F6 to make the array shared. What that means is that NSS will allow two controllers to access it, but also will make sure to mediate them and only allow once access at a time. You can imagine the chaos that would ensue if you allowed two controllers to mount the same volume at the same time. There are file systems out there that support it (OCFS on Linux) but they are not the norm and NSS is not one of them.

After that you have a regular looking two node cluster.

Some things to keep in mind for troubleshooting.

Make sure the BIOS of the controllers are up to date and the SAME on both controllers. Dell provides updates in several formats, one assumes you have Windows installed, one for Linux, and one that will run from a bootable DOS disk. Use the last type for this model.

There is one final BIOS that you need to confirm the version of. The EMM's (Enclosure Management Modules) that the SCSI cables actually plug into, maintain their own BIOS as well. The easiest way to find the version is to use the Ctrl-M BIOS interface, or DELLMGR.NLM and under Objects, Physical Drive, look at SCSI ID 15, and it will say Dell PV22x and a version number. E.19 was the latest at the time of this writing. There were definite bugs and issues in older version of this BIOS. Again, make sure both EMM's are at the same version. If there is a newer version and it needs an update, it is a little bit tricky. You need to find a Windows box with an external SCSI port, install the software to update it on Windows and do each EMM separately. There is no DOS based updater for this one.

Just as important make sure the SCSI cables are the same length from each controller to the array. SCSI can be timing dependent and cable length, especially the difference between a 1 meter and 6 meter cable can actually cause problems.

In my experience this solution works well, and we ran with it with thousands of simultaneous users for several years. The better solution is to get an entry level SAN of some kind and run it over Fibre Channel. Performance is very significantly better with a Fibre Channel SAN behind it.

Having said all that, if you need a cheap way to get into clustering this solution is something to consider.


How To-Best Practice
Comment List