Hello,
I noticed that the standby node in our AF cluster has been showing STANDBY_DB_FAILED START_FAILED state since a few weeks.
Local? NodeType State OvStatus Hostname/Address
------ -------- ----- -------- ----------------------------
* REMOTE DAEMON STANDBY_DB_FAILED START_FAILED node2/node2-17831
LOCAL DAEMON ACTIVE_NNM_RUNNING RUNNING node1/node1-29522
(SELF) ADMIN n/a n/a node1/node1-57504
nnmcluster-daemon.0.0.log shows this with one WARNING line related to nmsdbmgr:
Sep 13, 2024 11:44:55.571 PM [ThreadID:43] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster viewAccepted: New cluster View accepted: [node2-13411|2] [node2-13411, node2-17831, node1-29522]
Sep 13, 2024 11:45:43.406 PM [ThreadID:43] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster viewAccepted: New cluster View accepted: [node2-13411|3] [node2-13411, node2-17831, node1-29522, node1-15523]
Sep 13, 2024 11:45:50.507 PM [ThreadID:43] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster viewAccepted: New cluster View accepted: [node2-17831|4] [node2-17831, node1-29522, node1-15523]
Sep 13, 2024 11:45:50.508 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterInfo : Detected controller change from node2-13411 to node2-17831. Requesting updated node info from all nodes
Sep 13, 2024 11:45:50.510 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster : Received updated node info NodeInfo(addr=node2-17831, type=DAEMON, state=QUERY_CONTROLLER, ovstatus=NOT_RUNNING, startTime=-1)
Sep 13, 2024 11:45:50.574 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster : Received updated node info NodeInfo(addr=node1-15523, type=ADMIN, state=NONDAEMON_READY, ovstatus=null, startTime=-1)
Sep 13, 2024 11:45:50.576 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster : Received updated node info NodeInfo(addr=node1-29522, type=DAEMON, state=ACTIVE_NNM_STARTING, ovstatus=STARTING, startTime=1,726,263,897,964)
Sep 13, 2024 11:56:00.655 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NodeStateTransition setState: Transitioning NodeState from QUERY_CONTROLLER to STANDBY_INITIALIZING
Sep 13, 2024 11:56:00.718 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NodeStateTransition setState: Transitioning NodeState from STANDBY_INITIALIZING to STANDBY_QUERY_DB
Sep 13, 2024 11:56:04.069 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NodeStateTransition setState: Transitioning NodeState from STANDBY_QUERY_DB to STANDBY_PREPWORK
Sep 13, 2024 11:56:07.233 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster startDbOnStandby: Starting NNM database.
Sep 13, 2024 11:56:07.234 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NodeStateTransition setState: Transitioning NodeState from STANDBY_PREPWORK to STANDBY_DB_STARTING
Sep 13, 2024 11:56:08.713 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000E4.zip
Sep 13, 2024 11:56:08.824 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/nmsas/NNM/log/audit-2024-09-13.log
Sep 13, 2024 11:56:09.199 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000E5.zip
Sep 13, 2024 11:58:06.301 PM [ThreadID:82] WARNING: com.hp.ov.nms.admin.nnmcluster.utils.ExecProc call: Command ("/opt/OV/bin/ovstart" "-c" "nmsdbmgr" ) returned non-zero exit status: 1
Sep 13, 2024 11:58:06.303 PM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NodeStateTransition setState: Transitioning NodeState from STANDBY_DB_STARTING to STANDBY_DB_FAILED
Sep 13, 2024 11:59:09.088 PM [ThreadID:56] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster viewAccepted: New cluster View accepted: [node2-17831|5] [node2-17831, node1-29522]
Sep 14, 2024 12:00:01.479 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/nmsas/NNM/log/audit-2024-09-13.log
Sep 14, 2024 12:00:28.377 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000E6.zip
Sep 14, 2024 12:00:28.440 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000E7.zip
Sep 14, 2024 12:00:28.689 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000E8.zip
Sep 14, 2024 12:00:29.080 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000E9.zip
Sep 14, 2024 12:00:29.712 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000EA.zip
Sep 14, 2024 12:05:39.637 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.core.FileService : Removed /var/opt/OV/shared/nnm/databases/Postgres_standby/PostgresBackup.zip from the queue because it is no longer on the active
Sep 14, 2024 12:05:39.677 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000EB.zip
Sep 14, 2024 12:05:40.149 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000EC.zip
Sep 14, 2024 12:10:52.498 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.core.FileService : Removed /var/opt/OV/shared/nnm/databases/Postgres_standby/PostgresBackup.zip from the queue because it is no longer on the active
Sep 14, 2024 12:10:52.540 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000ED.zip
Sep 14, 2024 12:13:04.273 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster : Rejecting db transfer while in state STANDBY_DB_FAILED
Sep 14, 2024 12:13:04.273 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.NnmCluster : Rejecting db transfer while in state STANDBY_DB_FAILED
Sep 14, 2024 12:00:28.440 AM [ThreadID:28] INFO: com.hp.ov.nms.admin.nnmcluster.ClusterFileReceiver : Receiving file: /var/opt/OV/shared/nnm/databases/Postgres_standby/TxWALs_recv/0000002300001B9C000000E7.zip
Is there any one who may know how to fix that?
I read this article: Application Failover Cluster got status STANDBY_DB_FAILED START_FAILED (microfocus.com)
but it is not suitable for our case as the standby node will not start normally because the embedded DB cannot start.