oracle database backup failed with error 61:3003 lost connection and 10054 connection reset by peer

we have a oracle RAC, the size of the db is about 2.5TB.

After backuped around 6 or 7 hours, we found the "DP managed control file backup " is finished successfully, followed an error:

[61:3003] Lost connection to BMA xxx on host xxx
Ipc subsystem reports: "IPC Read Error System error: [10054] Connection reset by peer.

The amount size of the backup data is about 2.5TB, nearly equal to the size of the database.

There is no firewall between the cm and the client. the firewall on the cm and the IPsec on the client is disabled.
The CM is a windows 2003.
The client is both the DA and MA, it'a a rhel 5.8 system.

Tags:

Parents Reply Children
  • Found below tips on the help.chm, will try this:

     

    Troubleshooting Networking and Communication 

     

    You can configure the TCP/IP protocol to use 8 instead of the default 5 retransmissions. It is better not to use higher values because each increment doubles the timeout. The setting applies to all network connections, not only to connections used by Data Protector.

    If the Cell Manager is running on a Windows system, apply the change on the Cell Manager system first. If the problem persists or if the Cell Manager is running on a UNIX system, apply the change to the problematic Windows clients.

     

    1. Add the DWORD parameter TcpMaxDataRetransmissions and set its value to 0x00000008(8) under the following registry key:

      HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

    2. Restart the system.

  • KeepAlive Parameters also work with DP6.xx

     

    Suggest to set these in omnirc-File on the Client and to restart DP Inet.
    OB2INETTIMEOUT=60
    OB2SHMIPC=0
    OB2IPCKEEPALIVE=1
    OB2IPCKEEPALIVETIME=900
    OB2IPCKEEPALIVEINTERVAL=60
    OB2RECONNECT_RETRY=3600
    OB2RECONNECT_ACK=3600