Hi all,
We have this setup for our DP 23.4 backups of our VMware VM's:
- RHEL 7.9 DP Cell Manager VM
- 2 x RHEL 7.9 physical media servers
- 1 x HPE Storeonce D2D
- physical servers are SAN connected to the D2D's
Backups have been working fine for now for over 4 weeks without issue, approx 500 VM's to backup, approx 100 jobs, spread over the whole night, from 16:00 - 03:30 7 days a week, 6 incrementals, 1 full. Great, DP working as designed. Then this week, twice it's happened, we had 99% of the nightly jobs fail with the error after successful VMWare snapshots:
[61:2052] Bar backup session was started but no client connected in 600 seconds. Aborting session!
A couple of jobs work, the rest all have this issue. So first time it happened we rebooted D2D and both media servers, and tested a backup, and it worked, and the next two nights of 100 jobs worked fine, ok, problem "solved". Then last night, again, a 99% failure rate with the 600 seconds error. So second time round now, we rebooted only the D2D and changed the timeout to 30 mins as per this in the manual:
https://docs.microfocus.com/doc/Data_Protector/24.1/ZDBBarBackupSessionAborted
And the re-run worked, but in 18 mins hanging waiting to start the backup. Hmm we thought, this hasn't fixed the issue, it's only allowed it to hang for more than 600 seconds before it starts. So we rebooted both RHEL physical media servers, and re-ran the job, and instead of 18 mins hanging, it hung for about 2 seconds between snapshot and starting backups! So there seems to be something wrong on the RHEL physical media servers which is hanging backups, but we don't know why.
Has anyone else seen this type of behaviour before, any suggestions of what to look out for? Oh, and we can't send debug logs to Opentext, customer does not allow it, so the only chance to track the cause is to raise forums posts, and trawl through 60,000 lines of debug output finding the cause of the 18 minute hang.
Thanks,
Andy