Moving SAN - Cluster split brain and DHCP

Hey Guys,

We are mid SAN migration, somone has kindly pointed out, that as well as the NSS volumes we had hosted on the old SAN (which we have now migrated) also running off the old SAN is DHCP and more scarily the split brain for our cluster (3 nodes oes2sp2 sles10 sp3).

I've found the following link which frightenenly makes moving the split brain look fairly straight forward:

www.novell.com/.../viewContent.do
  • We moved from a Xiotech SAN to a Compellent SAN

    I did two different methods that worked

    One was the native "import" feature that Compellent offered. This required downtime however. Basically via that method, you offline the resource (or in the case of the SBD, power down the entire cluster) and it basically copied the non-empty sectors from the one SAN to the other SAN's volume. Then we had to re-attach/link the Fibre Channel connections to the new SAN. We kept the LUN numbers the same. And of course, disconnected the old "Vdisk/LUN" from the Xiotech SAN

    Powered things back up and voila. Once the SBD was copied over, we could more easily offline the desired resource, disconnect that LUN/Vdisk from the servers, and then copy/import it to the new SAN, re-connect and then we rebooted the physical nodes to ensure that it did a complete SCSI bus rescan of the appropriate LUNs, etc.

    I also did a few trial runs using Clonezilla and while that worked, hit was hideously slow (IMO) on EVMS/NSS volumes compared to the Compellent method (but then again I had to copy the entire disk regardless of whether there were empty sectors or not), but that did work.

    DHCP shouldn't matter to be honest, since it's just a cluster resource (I don't think it's linked to an actual NSS volume, but rather the root partition on the physical Linux node), so moving from one SAN to another shouldn't involve the DHCP cluster resource, IMO. Now, if you made an actual NSS volume to "link" it to the DHCP cluster resource (like if you put iPrint on NSS on OES2) then treat it just like any other of your cluster resource volumes that you have migrated already.

    If you're mid-migration, and you already moved all your NSS volumes, then really the only thing left is the SBD partition and possibly your Boot volumes if you boot from SAN (we do).

    In that case, it gets REAL fun. Even with the Compellent move, because we boot from SAN, that obviously changes the "guid" (or whatever it's called) of the SCSI LUN, so I had boot each physical node up from the rescue media and manually edit the /etc/fstab and menu.lst files and recompile the rampdisk so that it had the correct disks. In other words, if you partitioned your LUN0 as:

    /boot
    swap
    /

    You'll have something like this in your /etc/fstab:

    /dev/disk/by-id/bighairynumber for /boot
    and so on

    When you transfer/copy to the new SAN, that bighairynumber will change, so you need to adjust that.

    But if you're NOT booting from SAN, then no worries.
  • Thanks for taking the time to write that kjhurni,

    Unfortunatly I was on a deadline with our SAN migration, and people wanted the resources moved ASAP, so I didn't get a chance to experiment with copying the split brain. In the end I just held my breath and followed the tid, but deleting the old split brain first was a bit terrifying, as it turns out though, the tid was spot on and actually quite easy.

    Thankfully as well we don't boot from SAN, so we managed to avoid that scary senario.

    The only resource I have left on the old SAN now is DHCP, it looks like the contractor who set it all up for us some years ago, assigned it its own NSS volume. Now I'm happy with moving the files to the new SAN, but does anyone know what i'd have to change on the existing cluster resource (is it even possible, or will I have to re-create it). Can I just change the load/unload scripts? (I'm betting its not that easy)

    Thanks Guys
    Dave
  • hmmm, it's possible that the person who set it up just created an "empty" NSS volume??

    I don't honestly remember if you can setup OES2 DHCP to write the data/conf info to an NSS volume.

    If you don't mind posting your load/unload scripts (feel free to change the IP and server names for any "incriminating" evidence) that may help me figure out what the person did.

    Do you know what contents (if any) are on the NSS volume for the DHCP "server"?)
  • Thanks again Kjhurna,

    I've been reading the Docs on OES2 and there is a section on "Configuring DHCP with Novell Cluster Services for the NSS Files System", but I find Novell Docs a really dry read and i'm just struggling with it a bit. Novell Documentation The Novell docs go on about running a script that creates a file structure on the nss volume, which is present on our DHCP volume, but i'm hoping I can just move it to the new volume.

    Our load script for DHCP looks like:
    #!/bin/bash
    . /opt/novell/ncs/lib/ncsfuncs
    exit_on_error nss /poolact=DHCP
    exit_on_error ncpcon mount DHCP=252
    exit_on_error add_secondary_ipaddress 10.198.110.3
    exit_on_error ncpcon bind --ncpservername=C1_DHCP_SERVER --ipaddress=10.198.110.3
    exit_on_error /opt/novell/dhcp/bin/cluster_dhcpd.sh -m /media/nss/DHCP
    exit 0

    And the unload:
    #!/bin/bash
    . /opt/novell/ncs/lib/ncsfuncs
    ignore_error killproc -p /var/run/dhcpd.pid -TERM /usr/sbin/dhcpd
    ignore_error ncpcon unbind --ncpservername=C1_DHCP_SERVER --ipaddress=10.198.110.3
    ignore_error del_secondary_ipaddress 10.198.110.3
    ignore_error nss /pooldeact=DHCP
    exit 0

    Thanks
    Dave
  • Okay, so I see what they're doing now. Something similar to the iPrint relocation script.

    Okay, you have a few choices here.

    My GUESS is that this is a small volume (I can't imagine an NSS volume for DHCP having lots of data on it).

    You could certainly use the miggui to migrate it to a new cluster resource, which of course will change the NSS pool names, Volume, and IP address. So you'd have like:

    /poolact=DHCPNEW
    ncpcon mount DHCPNEW=somenumber
    and so on.

    However, I'm not sure if additional changes would be necessary.

    I do see in the docs that there's essentially a DHCP "relocation" script that's similar to iPrint.

    Now, if it were me:

    1) Assuming you're just migrating SAN, yes? So the IP address CAN remain the same? Then I would actually backup the volume by tape backup (or something), delete it, recreate it and restore it. I think you'll have to recreate the cluster load scripts, but I THINK everything else would be the same.

    However, since this is a production environment, let's ask over in the Migration forum and see if Ramesh has an idea of what's actually needed. Basically we could treat this as an OES2 SP3 to OES2 SP3 migration within a cluster.

    --Kevin