DP 10.30 / Service Guard / Huge restore of 52TB fails randomly.

1 Likes
over 1 year ago

There is a huge restore that needs to be performed. The total size is 52 TB. When it starts, all works and at some point just crashes.

It can be at 14 TB, 3TB or less.

DP 10.30 running on RHEL 7 with Service Guard.

Solution

---Investigation---

Searching the problem, we saw that cluster is chainging the node from one to another in any moment.

The omnisv detected a false negative saying that HPDP-AS is down.

 

---Solution---

This incorrect output, makes the Service Guard to halt to the other node and crashes the restore.

To make it work, we disabled the Service Guard and keep only one node running. It worked with no issues.

 

Note: This is a temporary solution. The problem with the omnisv wrong output is still under investigation. But this can work if you need to do a restore and have the same behavior. 

Labels:

Support Tip
Comment List
Anonymous
Related Discussions
Recommended