SMAX 2020.05 On premises CDF upgrade hang on step 2/7 : error parsing HTTP 400 response body
Currently encountered an issue when upgrading from 2020.02 into 2020.05. Managed to smoothly apply the SMAX 2020.02.003 patch.
The issue is experienced on the first master when i attempt to execute the ./upgrade.sh -i script. It just hang on when pushing image to private registry (2/7): localhost:5000/kubernetes-vault-init:0.5.0
2020-10-07T18:22:49.315280904+02:00 DEBUG exec_cmd # /opt/kubernetes/bin/docker push localhost:5000/kubernetes-vault-init:0.5.0
The looks shows it returned HTTP 401 error : error parsing HTTP 400 response body: unexpected end of JSON input: ""
For more info see the below from the upgrade logs.
From than the upgrade process will hang as the script goes in a retrying loop with same error message.
Has anyone experienced the issue at hand? i did two attempt of the upgrade but same result each time.
The first line indicates that you are already successfully logged into the local registry locahost:5000 which is ok
What will happen if you run these commands separately? It will not break anything on your system but at least may help to understand why the first push (for coredns) was Ok and the second push fails or if the first one is now failing instead of saying already exits
#docker push localhost:5000/hpeswitom/coredns:1.6.2
#docker push localhost:5000/kubernetes-vault-init:0.5.0
Can you also check if the output of config.json located in /root/.docker is OK – proxy ?
The only difference I see is that coredns:1.6.2 is for repository “hpeswitom” while kubernetes-vault-init:0.5.0 is directly in localhost:5000
Firstly Thank you for the prompted reply. Sorry to get back to you this late, i wasn't feeling well the past week. Currently paused the upgrade due to critical integration in progress.
Perhaps i am looking at a wrong place but failed to allocate the config.json in /root/.docker ?
More info which might help identify the root cause.
- Attempted to install the CDF Monitoring Pack - this tool installed few images from docker. the guide used - https://docs.microfocus.com/itom/Monitoring_Pack:1.1.50/Home
- After the upgrade failed, the main master can't fully access the mounted points on the NFS. believe it's caused by the upgrade script as it removed/backed up all config and attempted to create new one and failed in that process, see output from status.sh on the main master below, mind you the rest of nodes successfully get the mount points.
This started by an upgrade hanging issue because docker could not push some images to the local repository. Now it looks like your mount points are broken
I would suggest checking if the volumes are still correctly mounted from the master 1 by running this command from the master 1 server
#showmount -e FQDN_of_NFS_Server
Do the same from any other node where the mount points are still ok and compare the output
Then check if the Persistent Volumes are still present on the cluster
#kubectl get pv
Confirm that all Persistent Volumes Claims (PVC) are correctly bound (status) to the Persistent Volumes (PV)
#kubectl get pv -n itsma-xxx
The warning messages you see there are either because the PVC are not bound to the PV or the folders owner and permissions have changed, or the PV are missing somehow and need to be reconfigured
Best approach is to review the steps from this URL: https://docs.microfocus.com/itom/SMAX:2020.02/ConfigNFSShares
If the issue persists, I suggest opening a support case for future investigate
PS: I am not familiar with the monitoring pack. I have not tested/installed it yet. I cannot even open your link https://docs.microfocus.com/itom/Monitoring_Pack:1.1.50/Home
Please bear in mind that the mount points appeared like that after the upgrade hanged.
Yea the link for the monitor pack tool seem not to work anymore, see the recent link for more info on the tool.
From marketing place - https://marketplace.microfocus.com/itom/content/monitoring-pack
Installation/uninstall guide - https://docs.microfocus.com/itom/Monitoring_Pack:1.2.0/Install201911
Progress or steps taken so far.
- Followed again the uninstall steps on the above link. then re-run the upgrade.sh -i script, it managed to pass/upload all the images.
In addition, if i run the kube-status.sh. This display only those two mounted points on the NFS.
From command you provided, it does show they are bound, see below.
well on the kubernetes cluster level, all your PVC seem to be correctly bound to their corresponding PV. I don't think it is permissions or file owner/group issue. I would expect it to fail not just a warning
We should sort out why the kube-status.sh is displaying those 2 folders with warnings. Just guessing, still enough space on the NFS ?
when you run kube-status.sh, you normally have log files created under: /opt/kubernetes/log/scripts/kube-status folder
What is the status of kube-status from other master such as master2 ? it also display the warning ? if not I would suggest to compare the kube-status log files from master 1 and master 2
If the issue persists, I suggest opening a support case with the details we already have here. I dont think it is good idea to upload your log files here.
Thank you very much man, a support call have been logged and attached this discussion to fast track the troubleshoot on MF's side.
1. The NFS does have enough space.
2. All the other nodes display all the mount points when executing the kube-status.sh