Gathering OES23.4 In-Place Upgrade issues (and their solutions)

Hi. I wanted to start a thread with all real-life Inplace Upgrade Issues long with their (if known) possible solutions. Also, I want to generally discuss the upgrade process, and if there isn't a lot of potential for improvement, cause in my mind, there are dozens unnecessary and dangerous steps the Upgrade process does for no reason.

So let's start:

1. Check /etc/sysconfig/novell/edir_oes2018_sp3, if it contains 'SERVICE_CONFIGURED="yes"'. If it says "no", the upgrade to oes2018sp3 hasn't finished properly. Most likely, the "yast channel-upgrade-oes" didn't happen or ran into an error that wasn't fixed, and now haunts you. Your server most likely is working fine anyways, but your ugrade will fail miserably. (*)

Solution: Edit the file to read 'SERVICE_CONFIGURED="yes"' *before* attemting the upgrade. If you don't do this, the upgrade process will attempt to add your server to the eDir Tree instead of upgrading it.
(IMNSHO this is a massive bug. Relying on some freely editable, notoriously unreliable config file that has no meaning whatsoever to the operation of the server, to determine if a server is active part of an eDir tree or no is insane. Why not just ask eDir instead?)
Also, when this happens, verify the other OES2018SP3 files in /etc/sysconfig/novell, as most likely some others are wrong.

2. The Upgrade *will* activate the Firewall in all cases, which will block most non-standard traffic. Solution: Obviously to disable the firewall after the upgrade again, or configure it for your needs. I personally consider Server Side Firewalls a completely broken idea.

3. The upgrade will alter your Timesync config massively. If you have multiple time servers configured now, it will only take over the first one, plus, it will add a public SUSE NTP pool to your setup without asking. On top (and this is nasty), it will stop your server from answering to NTP traffic, as the /etc/chrony.conf it creates does not contain an "allow=" line. Many installs do rely on OES Servers as their Time Sources, and they will no longer work after the upgrade.

Solution: Edit /etc/chrony.conf (or use yast) and add back all your servers. Also, remove /etc/chrony.d/pool.conf (that is the public suse server pool you may not want).

4. Less important, but may hit you anyways, especially when you run Groupwise: The Upgrade will reenable postfix when it was disabled before. Solution: Disable postfix again, if e.g your GWIA will no longer listen on port 25, and you need it to listen on more than one IP.

More to come. Feel free to add to the discussion what you have found.


 



  • 0  

    5. Remote Manager is broken after the upgrade. See  Remote Manager not working after 23.4 Upgrade  (also has the workaround, no real solution yet)

    6. Occasionally during the eDir upgrade, for some yet unknown reason LDAPS won't come up, despite everything being fine, resulting in an error message talking about maybe the password wrong, time not in sync, or ldap not working. Solution: Switch to another console (ctrl-alt-F2), and restart ndsd, while the error is shown (systemctl restart ndsds). Verify ldaps is up using "ss -nlp | grep 636"

  • 0   in reply to   

    There is something I don't understand at all. I have 5 OES servers on VMware in live operation. The servers were migrated from OES 2018 SP3 to OES 23.4. I had successfully tested the following methods.

    All functions such as iManager, Remote Manger, UMC are working after the migration. One system was upgraded with Inplace Update. I migrated one with yast2 waggon. Yast2 waggon is not supported as far as I have seen. I upgraded another one with an install server, another one first with ISO on 2023, then on to 23.4 and the last one with auto yast.

    After the first run it was always important to finish the migration in the xWin system or in the VMware browser. What is always important in the first step was always the complete patching of the systems. Another thing is that there should never be a master replica on a server during the migration and that all replicas should be rw. It is also important to ensure that the tree is moved before the server in question is migrated. It is clear that all possible checks such as health checks of the NDS etc. should be carried out before the migration.  I also always found ./oes_upgrade_check.pl all helpful.

    The issue that revolves around SERVICE_CONFIGURED="yes" comes as far as I can tell from the fact that after an upgrade to OES 2018 SP3 in X-Windows / on the console the migration was not completed successfully. After the second restart, the DIB set is usually migrated and some other steps are performed. 

    “You can't teach a person anything, you can only help them to discover it within themselves.” Galileo Galilei

  • 0  

    mrosen, you forget many other troubles, i will say this was "a big catastrophe", this is a no-go!

    other troubles:

    dns/dhcp-Server

    http-Server

    NFS

    Graphical interfaces not working

    And one month after the released version, not all problems are resolved!

    Why is it not possible to create by installation a error-log-file with all installation troubles, so that it is possible to send this file to OT for analyzing!

    We had productive machines and will not end as bad system integrator, we must have more trust in OT and performant and quickly support.

    Where is the next buyer of the Novell/MF-Products?

    Sorry, i am a long customer, but never seen this in near 40 years!

  • 0  

    I forgot:

    7. SNMP is broken after the upgrade. Solution: "zypper in libsnmp40"

    See also: community.microfocus.com/.../

  • 0   in reply to   

    FWIW, I didn't have any problems with DNS/DHCP, and virtually all servers I upgraded run those.

    *BUT*: And that is part of the discussion: In an upgrade from OES2018SP3 to OES20234, IMHO it is *completely* unnecessary to touch DNS, DHCP, LUM, LDAP and a few other configurations. And *of course* everything you touch although you don't have to, has the potential to break. Especially, if you, as the OES Upgrade has done forever, do *NOT* take the configuration of the existing services *from* the service itself, but instead pull it from some absolutely meaningless fil in /etc/sysconfig/novell, which may, or may not match the true config.

    While we're at it: It is also absolutely unnecessary too, to "upgrade" the eDir schema or the NMAS methods on *every* server. This needs to be done exactly once per tree. That's another potentially dangerous step.

    Oh, and I forgot. Using LDAPS for most of the configs without reason, and without at the same time verifying *BEFORE* anything happens with the server that this is working flawlessly, is another of those sillinesses. NCP exists and is 100% reliable. LDAPS (especially the "S" part of it, is a major nightmare ever since OES exists, and is simply not even remotely reliable enough.

  • 0   in reply to   

    Are you serious, or ironic? Apart from the fact that a "yast2 wagon" upgrade from OES2018 to 2023 isn't just "not supported" but outright completely impossible, you are mainly talking about migration, which is a completely different story altogether, and finally, you think what you state about the replicas is ok? (Apart from the fact that it's not true, of course you can upgrade Master replicas and all others, if you know what you're doing and your tree syncs properly.) But yeah, it *can* lead to issues, which is all again about the upgrade using LDAP instead of NCP to connect to a single (or best case, two) dedicated eDir servers instead of, as NCP would, just any* currently working eDir Server in the tree, which it would find on it's own.

    Using LDAP over NCP for such tasks is like preferring a shovel over an excavator to dig the foundation of your house.

  • 0   in reply to   

    I had a fun one today. I just in-place upgraded 10 OES 2018.3 server to OES 23.4 with virtually no issues (booting the ISO).  On the 11th one, I get this failure now:

    Details:

    These are all VMs, so I aborted, rolled back, and checked, and that kernel isn't even installed:

    rpm -qa | grep kernel-default
    kernel-default-4.12.14-122.183.1.x86_64
    kernel-default-4.4.180-94.121.1.x86_64
    kernel-default-4.12.14-122.37.1.x86_64
    kernel-default-4.12.14-122.124.3.x86_64
    kernel-default-4.12.14-122.32.1.x86_64
    kernel-default-4.12.14-122.113.1.x86_64
    kernel-default-4.12.14-122.26.1.x86_64
    kernel-default-4.12.14-122.91.2.x86_64
    kernel-default-4.12.14-122.159.1.x86_64
    kernel-default-4.12.14-122.46.1.x86_64
    kernel-default-4.12.14-122.54.1.x86_64
    kernel-default-4.12.14-122.83.1.x86_64
    kernel-default-4.12.14-122.106.1.x86_64
    kernel-default-4.12.14-122.136.1.x86_64
    kernel-default-4.12.14-122.71.1.x86_64
    kernel-default-4.12.14-122.57.1.x86_64
    kernel-default-4.12.14-122.127.1.x86_64

    I tried it one more time, same issue. So I tried just seeing what happens if I do Ignore.  It finishes the package installation, but then blows up after that.  I never get to the Upgrade eDirectory question, instead I get this:

    I hit ok, and the server comes up to the login prompt.  So the server still boots, but it never ran any of the OES 23.4 upgrade process.  I ended up rolling back again.

    Wondering if anyone has any ideas on that one?

    Matt

  • 0   in reply to   

    This can be a place/size problem on your /boot partition. There are 2 solutions for this:

    1. Remove on / boot all old kernel files and all *.gz files

    2. increase the size for partition /boot (more difficult)

    then you can install successfully the new version oes23.4

    you can also use putty during the installation and remove the old kernel files on /boot

    i had this problem and solved it my self and reported to OT

  • 0   in reply to   

    That's not the issue, there is plenty of space.  I thought space on boot as well and that was the first thing I checked.  This server is actually using EFI so  /boot isn't a partition, /boot/efi is and there is 5.5GB of space there and almost nothing is used.  In /boot (which I assume must then be on the root partition?), there is just under 1GB of kernel files.  And again, there is plenty of space (27G).  Most of the servers I upgraded are configured identically and had no issues whatsoever.  Only this one is having an issue.  

    I did try removing one of the old kernels (rpm -e kernel-default-4.4.180-94.121.1.x86_64) but I get a huge list of dependency warnings so I cannot remove it.  

    Matt

  • 0   in reply to   

    your decision! you can't see the complete disk size, /boot had not large size!
    I had resolved my installation problem with removing this, also *.gz

    Try it and win!

    Keep only the latest kernel files!
    .vmlinuz-5.14.21-150400.24.97-default.hmac
    System.map-5.14.21-150400.24.97-default
    config-5.14.21-150400.24.97-default
    initred -> initred-5.14.21-150400.24.97-default
    initrd-5.14.21-150400.24.97-default
    vmlinuz -> vmlinuz-5.14.21-150400.24.97-default
    vmlinuz-5.14.21-150400.24.97-default

    remove all *.gz

    Snapshot before and test it!

    KEEP ONLY YOUR LATEST VERSION OF KERNEL FILES!
    You have old version, oes23.4 had 5.14.21....