Hi.
I recently ran into an odd issue at a customer. We recently did a rolling cluster upgrade (4 nodes) from OES2018 to 24.2, without much issues. Now the first round of patches after the installation came, and we ran into the worst issue. After patching, the server (Dell) wouldn't boot.
This is what happened:
During the original installation, we answered "yes" to the question if we want to enable multipath. Without further configuration, this results in the local boot device also being accessed via device-mapper, which usually isn't a problem. Servers were rebooted countless times after initial installation, and always came up fine.
Until the first kernel upgrade, which we just now applied. The newly build kernel initrd after upgrade is not configured to load multipath at boot time. Result: Server doesn't boot after update, as it doesn't find root partition/boot device.
Insult added to injury: The kernel patch, while regularly patching OES factually rebuilds the initrd for *all* currently installed kernels. Yes you read that right. Result: After patching, not even the previous kernel will boot, as dracut has destroyed the previous, working kernel, too. You can clearly see that when you look at the dracut output at the end of a regular "zypper patch" run, that dracut will buld new initrds for *al* kernels. I consider this a *HUGE* bug. Under absolutely *no* circumstance should an update to a new kernel touch the previous one (and destroy it)
Fix: start rescue system, chroot to the boot disk, (www.suse.com/.../, and have dracut rebuild the initrd *with multipath support:
dracut -f --kver 5.3.18-150300.59.147-default --add multipath
(replace "5.3.18-150300.59.147-default" above with your most recently installed kernel version as visible in /boot)
exit chroot and reboot.
What is unknown as of now, is if that wil be persistent, or if it will break again when the next kernel patch gets released.
Attempts to "fix" this before patching the kernel, by adding multipath to the dracut configuration through seeveral documented methods have been unsuccesful so far. We tried:
And eventhough we *do* see "rd.driver.pre=dm_multipath" in the dracut output of the regular patch, it still won't boot, and needs to be fixed using above method.
We also tried "dracut --force --add multipath" before patching, again seeing mutipath being added, but still, after patching it won't boot, yet again needing the manual process above.
So apparently, the only way that works (or rather, that I currently know works) at this point in time is to manually run dracut with the "--add multipath" option, to get a working kernel with multipath support at boot time.