I have several brand new Dell PowerEdge Server R710 with lots of memory,
with 2xEmulex 12000, connect through a Brocade 48000 to an EMC CX400. I have
SLES 10 SP3 OES SP3 Cluster Services.
These are non-production servers (yet).
In-kernel Emulex driver, native linux multipath.
All is working.
If I disconnect one HBA then all is still working (as expected).
If I disconnect the second HBA, for the users it seems that they can still
read and write to the cluster volumes on the SAN, for about 20 seconds.
Then the cluster volumes jump to another node but of course all that was
written for those 20 seconds is lost.
I tried changing the polling in /etc/multipath.conf from 10 to 1. But it
still takes 20 seconds.
I don't even know if it's the SAN that takes too long to detect the fault,
the cluster software, or both.
Can someone help me?
Thanks in advance, TA.