[SOLVED] Upgrade from SLES10 SP3 & OES2 SP2 to SP4/SP3 breaks NCS

Hi folks,

(This post was originally meant to be a rant and a request for help, but
while writing the final paragraph i found the solution. It's still a
rant, but i figured i'd post my solution here in case someone else runs
into the same issue.)

I've just spent several hours banging my head against a broken cluster
node. My system is a 32-bit SLES 10 VM running on VMware ESX 3.5.x.

I upgraded from SLES10 SP3 and OES2 SP2 to the next service packs for
each (using the move-to-oes-sp3 script in yast2 online_update).
Everything went well for the first few update/reboot sequences, then
after the final reboot on SLES10 SP4
  • I feel your pain (I was in the same boat). For the SUSE stuff, I"d suggest maybe posting a short question in the SLES forums (ie, Why don't they do the upgrades like RHEL to pick from previous release)--although I have random servers that show previous kernels for some reason. I think I may have posted something about that a while ago, but it WAS a bit ago and I ended up having to open an SR. But I'm glad you found a solution.

    Unfortunately Vmware is not going to be "supporting" the VMI kernel for much longer --although OES11 will be 64-bit only so that'll nip THAT problem in the bud.
  • Hi Paul,

    First off, glad you managed to sort the issue or at least get it all running again.

    Not sure if it's related to the issue you've hit, but along the lines of what Kevin already mentioned, there are issues that can arries when using the VMI kernel (that I've seen) if having multiple flavors of the kernel installed along with it (as in having both kernel-vmi as also kernel-smp packages installed) ...
    Curious, is that also the case with your setup? I've moved to only using the smp kernel on VMware (along with the clock=pit boot option to avoid time drift issues, or pmtr I think in your case when also running NCS services in the vm).

    -Willem
  • On 13/10/11 01:16, kjhurni wrote:
    > ...
    > Unfortunately Vmware is not going to be "supporting" the VMI kernel for
    > much longer --although OES11 will be 64-bit only so that'll nip THAT
    > problem in the bud.


    I'm curious: does anyone expect that there will be an upgrade path from
    32-bit SLES10/OES2 to 64-bit SLES11/OES-whatever? I've never found a
    single Linux distro which supported upgrade from 32-bit to 64-bit, no
    matter what the versions...
  • On 13/10/11 07:36, magic31 wrote:
    > ...
    > Not sure if it's related to the issue you've hit, but along the lines
    > of what Kevin already mentioned, there are issues that can arries when
    > using the VMI kernel (that I've seen) if having multiple flavors of the
    > kernel installed along with it (as in having both kernel-vmi as also
    > kernel-smp packages installed) ...
    > Curious, is that also the case with your setup?


    The system in question has kernel-bigsmp and kernel-vmi installed. We
    only ever boot from kernel-vmi.

    > I've moved to only using the smp kernel on VMware (along with the
    > clock=pit boot option to avoid time drift issues, or pmtr I think in
    > your case when also running NCS services in the vm).


    When we installed the system (on OES2 SP1, i believe) it was a while
    ago, and kernel-vmi with clocksource=acpi_pm on the kernel command line
    was the only solution we could find to get reliable time. If there are
    updated best-practice documents, i'd be happy to hear about them.
    However, this is a production cluster and my boss is (rightly) rather
    reticent to make major changes.

    Paul
  • Paul Gear;2145985 wrote:
    On 13/10/11 01:16, kjhurni wrote:
    > ...
    > Unfortunately Vmware is not going to be "supporting" the VMI kernel for
    > much longer --although OES11 will be 64-bit only so that'll nip THAT
    > problem in the bud.


    I'm curious: does anyone expect that there will be an upgrade path from
    32-bit SLES10/OES2 to 64-bit SLES11/OES-whatever? I've never found a
    single Linux distro which supported upgrade from 32-bit to 64-bit, no
    matter what the versions...


    Somewhat offtopic to what the OP is reporting.... but guess you've found your Linux vendor that does support 32 to 64 bit OES migrations, which works well for OES2.
    Check out the migration matrix in OES2 SP3 and the upcoming OES 11.

    With SLES AFAIK you can do an inplace upgrade from 32 to 64 bit.... but I'd prefer backing up any relevant config files and reinstalling the system/root partition as an upgrade might get messy depending on what 3rd party products you have on there.

    -Willem
  • Paul Gear;2145987 wrote:
    On 13/10/11 07:36, magic31 wrote:
    > ...
    > Not sure if it's related to the issue you've hit, but along the lines
    > of what Kevin already mentioned, there are issues that can arries when
    > using the VMI kernel (that I've seen) if having multiple flavors of the
    > kernel installed along with it (as in having both kernel-vmi as also
    > kernel-smp packages installed) ...
    > Curious, is that also the case with your setup?


    The system in question has kernel-bigsmp and kernel-vmi installed. We
    only ever boot from kernel-vmi.

    > I've moved to only using the smp kernel on VMware (along with the
    > clock=pit boot option to avoid time drift issues, or pmtr I think in
    > your case when also running NCS services in the vm).


    When we installed the system (on OES2 SP1, i believe) it was a while
    ago, and kernel-vmi with clocksource=acpi_pm on the kernel command line
    was the only solution we could find to get reliable time. If there are
    updated best-practice documents, i'd be happy to hear about them.
    However, this is a production cluster and my boss is (rightly) rather
    reticent to make major changes.

    Paul


    the official Vmware paper on time is that IF you're using SLES 10.x 32-bit you will use VMI and NO kernel params (ie, get rid of the clockpit and clocksource=blah).

    But if 64-bit then you're okay.

    The easiest, IMO (especially with vmware) is to use the miggui (migration utility). That's how I'm converting all my 32-bit servers in vmware to 64-bit. Works quite well.

    --Kevin
  • kjhurni;2145995 wrote:
    the official Vmware paper on time is that IF you're using SLES 10.x 32-bit you will use VMI and NO kernel params (ie, get rid of the clockpit and clocksource=blah).

    But if 64-bit then you're okay.

    The easiest, IMO (especially with vmware) is to use the miggui (migration utility). That's how I'm converting all my 32-bit servers in vmware to 64-bit. Works quite well.

    --Kevin


    I'd go with Kevin on that. Have only had one production cluster run inside VMware and that ran ok with that parameter...
    With all the recovery options virtualization brings and also with Linux in general, have been mostly using standalone vm's. The only 'pain' there compared to clustering, is the extra downtime needed when patching the OS as you don't have the option of migrating resources. For us that has not really been an issue.

    -Willem
  • On 13/10/11 08:46, magic31 wrote:
    > ...
    > Somewhat offtopic to what the OP is reporting....


    I am the OP... ;-)
  • On 13/10/11 09:26, magic31 wrote:
    > ...
    > I'd go with Kevin on that. Have only had one production cluster run
    > inside VMware and that ran ok with that parameter...
    > With all the recovery options virtualization brings and also with Linux
    > in general, have been mostly using standalone vm's. The only 'pain'
    > there compared to clustering, is the extra downtime needed when patching
    > the OS as you don't have the option of migrating resources. For us that
    > has not really been an issue.


    If patching were as easy on SLES/OES as it is on Debian, that might be
    an option, but if we've only got 30 minutes of downtime, we can't
    possibly complete a service pack application in that time, even on a
    fast server with a local SMT mirror. Even the "smooth" service pack i
    did two days ago on our non-clustered OES2 master replica took well over
    90 minutes.

    Paul
  • Paul Gear;2146015 wrote:
    On 13/10/11 09:26, magic31 wrote:
    > ...
    > I'd go with Kevin on that. Have only had one production cluster run
    > inside VMware and that ran ok with that parameter...
    > With all the recovery options virtualization brings and also with Linux
    > in general, have been mostly using standalone vm's. The only 'pain'
    > there compared to clustering, is the extra downtime needed when patching
    > the OS as you don't have the option of migrating resources. For us that
    > has not really been an issue.


    If patching were as easy on SLES/OES as it is on Debian, that might be
    an option, but if we've only got 30 minutes of downtime, we can't
    possibly complete a service pack application in that time, even on a
    fast server with a local SMT mirror. Even the "smooth" service pack i
    did two days ago on our non-clustered OES2 master replica took well over
    90 minutes.

    Paul


    Depending on how many patches need to be applied.... true, it can run up to a couple of hours (if having to first apply the base SP and then update with many post fixes for that SP).
    It can cut down the time needed to do the patching in batches depending on how much catching up the system needs to do to align to the latest patch base. Not that it is ideal.

    This is one of the reasons I'm really looking forward to OES 11 on SLES 11, as zypper in there is many many times faster compared to 'good old rug'. :-)

    -Willem