Clusters, Migration Wizard, and Backups

If you have been using Novell Cluster Services on NetWare (or perhaps you call it Open Enterprise Server, NetWare kernel), then you may be familiar with an issue that can arise with backups.

Novell has a nice backup infrastructure model called SMS, which uses TSA, the Target Service Agent to do its dirty work.

There are several TSA's. TSANDS is the TSA for NDS/eDirectory. TSAGW is the Groupwise TSA. For the file system it used to be TSA410 (for NetWare 4.x servers), TSA500 (for NetWare 5.x servers) and TSA600 (for NetWare 6.x servers). With later releases of NetWare 6.5, Novell released TSAFS, the TSA for File Systems.

TSAFS is miles faster than TSA600 and if you can use it, you should be!

Performance is much better, it is multi threaded, and there are many configuration options you can use to tune it.

You can control the block size it handles data, to try and match it to your backup software. You can control the number of threads doing reads to increase performance. (Obviously there are trade offs, 4 is better than 1 thread, but perhaps 100 threads is not such a great plan as you saturate the disk channel completely).

One of the features it adds that TSA600 has as well is Cluster volume support.

When you look at a servers available resources via a backup program, each TSA will present its set of resources that it can see.

Cluster volumes by default show up differently than regular volumes, which makes sense since you want to handle them differently.

For a while there, not all backup packages properly supported Cluster volumes via TSA. As a consequence a lot of people would load TSAFS with the /nocluster switch.

This makes TSAFS pretend that the cluster volumes currently mounted on this local node look like regular file system volumes. That means if the cluster migrates elsewhere, they will not continue to be backed up on this node. It means you need to make sure all cluster nodes are backing up all the local volumes so you get a backup on whatever node they happen to be mounted on when the backup window happens. It also means that incrementals are not so space saving, since if you do a full on Node1, then the volume migrates to Node2, the next backup will probably look like a full again.

Regardless, sometimes this is your only option. Nowadays, tape is cheap, bandwidth is cheap, might as well just do fulls all the time, right? If you have no other choice, what can you do? Just accept the extra costs.

If you have your cluster up and running in this mode, and you want to migrate a new server into the cluster, using a tool like Novell Server Migration and Consolidation Toolkit, you may run into a problem.

The Server Consolidation Toolkit uses TSA to move files between servers.

This is a good thing, as the communication is all server to server, not via an intermediary workstation. The meta data of the backup flows through the workstation, but compared to the data of the backup (Gigabytes to Terabytes) the metadata is quite small (Usually less than a couple of megabytes) which is much more efficient way to do it. Additionally TSA can backup all the namespace, ownership, attributes, and trustees in one fell swoop like a regular backup can.

The Server Consolidation Toolkit is smart and recognizes that your volume is actually a cluster volume and since it is using TSA, it tries to look for the Cluster TSA resource to connect too.

A common problem we have seen is that if you are running TSAFS /nocluster, then the Server Consolidation Toolkit cannot connect as it will not see any Cluster TSA resources.

The fix is pretty simple, but might be tricky to do!

Unload SMDR
Unload TSAFS
load SMDR
LOAD TSAFS /cluster

Now you can try it again, and the Server Consolidation Toolkit should connect and be able to migrate the data. The tricky part comes up when the time it takes to migrate the data impinges on the backup window, and your backups fail, since they are looking for local volumes, in the TSA resource list, but are now seeing Cluster TSA volume resources. Be careful with your timing, and consider cluster migrating the volumes to another node for the duration.


How To-Best Practice
Comment List