HBA failure takes more than 20 seconds

Hi all.



I have several brand new Dell PowerEdge Server R710 with lots of memory,
with 2xEmulex 12000, connect through a Brocade 48000 to an EMC CX400. I have
SLES 10 SP3 OES SP3 Cluster Services.

These are non-production servers (yet).

In-kernel Emulex driver, native linux multipath.

All is working.



If I disconnect one HBA then all is still working (as expected).

If I disconnect the second HBA, for the users it seems that they can still
read and write to the cluster volumes on the SAN, for about 20 seconds.

Then the cluster volumes jump to another node but of course all that was
written for those 20 seconds is lost.



I tried changing the polling in /etc/multipath.conf from 10 to 1. But it
still takes 20 seconds.

I don't even know if it's the SAN that takes too long to detect the fault,
the cluster software, or both.



Can someone help me?



Thanks in advance, TA.


  • On 31.05.2011 23:55, Tiago Abreu wrote:
    >
    > I don't even know if it's the SAN that takes too long to detect the fault,
    > the cluster software, or both.


    Neither. It's most likely the emulex driver. I suggest you check how to
    set it's timeouts (I don't know enough about Emulex to be of much hel,
    just pointing out where to look at).

    CU,
    --
    Massimo Rosen
    Novell Product Support Forum Sysop
    No emails please!
    http://www.cfc-it.de
  • Hi all.
    Thanks Massimo, you point me to the right direction.

    I've read several documentation from emc, emulex, novell, ibm, etc.
    The solution:
    echo "3" > /sys/class/scsi_host/host3/lpfc_devloss_tmo
    echo "3" > /sys/class/scsi_host/host5/lpfc_devloss_tmo

    The active Emulex are host3 and host5. The old value was 30. I've change from 30 seconds to 3 seconds.

    I only have one more question were should I put this?
    Is there a file for this type of thing? I've look for a file in /etc/sysconfig/* or /etc/sysctl.conf, other?
    Can I use yast2 and put this somewhere?

    Thanks in advance, TA.
  • Hi all.
    Thanks Massimo, you point me to the right direction.

    I've read several documentation from emc, emulex, novell, ibm, etc.
    The solution:
    echo "3" > /sys/class/scsi_host/host3/lpfc_devloss_tmo
    echo "3" > /sys/class/scsi_host/host5/lpfc_devloss_tmo

    The active Emulex are host3 and host5. The old value was 30. I've change from 30 seconds to 3 seconds.

    I only have one more question were should I put this?
    Is there a file for this type of thing? I've look for a file in /etc/sysconfig/* or /etc/sysctl.conf, other?
    Can I use yast2 and put this somewhere?

    Thanks in advance, TA.
  • I'm not sure about Emulex. For the qlogic drivers, there's a special command line parameter that you would put in the
    /etc/modprobe.conf.local

    file

    However, since those commands you listed for Emulex appear to be in a diff. format I THINK you'd maybe have to put them into the:

    /etc/init.d/boot.local

    file

    But again, I'm only familiar with the qla2xxx stuff (which is pretty well documented what to put and where--I'm surprised Emulex doesn't document this).
  • Hi all.

    Emulex use sysfs so "normal" bash commands work.
    I didn't find the documentation for Emulex, and I looked for it, but maybe it exists.

    I ended up putting the commands in /etc/init.d/boot.local, and it worked.

    Thanks, TA.