Got unexpected close from RMA

We are experiencing "Got unexpected close from RMA" Major error for a while now and it is becoming really frustrating. It happens randomly, sometimes at 5%, sometimes at 17% sometimes at 50%. The error apears when we try to write the data from disk to tape. What can be wrong, and we did not change anything in the backup specification?

We are using HP Data Protector 6.20 on linux redhat

 

Thanks in advance.

Parents
  • Hello Andrej,

     

    If you have an xMA unpredictably dropping its socket connection, you could be facing a system resource or network stability issue.

     

    I gather from the first response that your issue crops up during a D2T copy.  That means that you have RMAs reading the source data and passing it to BMAs writing to destination media.  All the while, there is control comunication between the xMAs and CSM on the cell manager with most of that being catalog updates from the BMAs.

     

    Are you running all of the agents on the cell manager?  Or do you have a dedicated media server?  Are the RMAs and BMAs running on the same server?  Or are you passing all of that data across a network connection between machines?  Check /var/log/messages for entries that correlate with the times that you've had RMA problems.

     

    Posting a complete set of session messages from a failed session would be helpful.

     

    Thanks,

    Mr_T

    DPTIPS

  • Mt T:

    Here is the current error from the log : please enighten whats gong on ?

    [root@ret-rh1p log]# tail messages
    Nov 9 10:46:13 ret-rh1p xinetd[6008]: START: omni pid=55085 from=::ffff:10.21.3.206
    Nov 9 10:46:13 ret-rh1p OB2DBG_StoreOnceSoftware_Debug.txt[64970]: -0800 2016-11-09 10:46:13 StoreOnceSoftware is within the memory capacity limit
    Nov 9 10:46:14 ret-rh1p xinetd[6008]: EXIT: omni status=0 pid=55081 duration=2(sec)
    Nov 9 10:46:14 ret-rh1p kernel: rma[55085]: segfault at 7fc84ef7c000 ip 00007fc84f4343dc sp 00007ffd98f9de30 error 4 in libserializer_64bit.so[7fc84f408000 52000]
    Nov 9 10:46:14 ret-rh1p abrtd: Directory 'ccpp-2016-11-09-10:46:14-55085' creation detected
    Nov 9 10:46:14 ret-rh1p abrt[55100]: Saved core dump of pid 55085 (/opt/omni/lbin/rma) to /var/spool/abrt/ccpp-2016-11-09-10:46:14-55085 (80351232 bytes)
    Nov 9 10:46:14 ret-rh1p xinetd[6008]: EXIT: omni signal=11 pid=55085 duration=1(sec)
    Nov 9 10:46:14 ret-rh1p abrtd: Package 'OB2-MA' isn't signed with proper key
    Nov 9 10:46:14 ret-rh1p abrtd: 'post-create' on '/var/spool/abrt/ccpp-2016-11-09-10:46:14-55085' exited with 1
    Nov 9 10:46:14 ret-rh1p abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2016-11-09-10:46:14-55085'

  • you are getting a RMA crash therefore I would configure redhat to capture the core file, then send to HP via a support case.


  • wrote:
    Nov 9 10:46:14 ret-rh1p kernel: rma[55085]: segfault at 7fc84ef7c000 ip 00007fc84f4343dc sp 00007ffd98f9de30 error 4 in libserializer_64bit.so[7fc84f408000 52000]

    Having the RMA segfault in the serializer lib (either during copies or during restores) for StoreOnceSoftware objects that otherwise succeed a Verify is exactly what I had here, after upgrading to 9.06, but on Windows. I've been working with HPE on a case for that and finally was told the bug in question should be fixed in 9.08. I just upgraded and indeed, the very same objects that made the RMA crash before are now copied without any fuzz. So if you are on 9.06 or 9.06_108 (dunno about 9.07), upping to 9.08 might be a good way to get rid of that crash (provided we are talking about the same final cause).

    HTH,
    Andre.

Reply

  • wrote:
    Nov 9 10:46:14 ret-rh1p kernel: rma[55085]: segfault at 7fc84ef7c000 ip 00007fc84f4343dc sp 00007ffd98f9de30 error 4 in libserializer_64bit.so[7fc84f408000 52000]

    Having the RMA segfault in the serializer lib (either during copies or during restores) for StoreOnceSoftware objects that otherwise succeed a Verify is exactly what I had here, after upgrading to 9.06, but on Windows. I've been working with HPE on a case for that and finally was told the bug in question should be fixed in 9.08. I just upgraded and indeed, the very same objects that made the RMA crash before are now copied without any fuzz. So if you are on 9.06 or 9.06_108 (dunno about 9.07), upping to 9.08 might be a good way to get rid of that crash (provided we are talking about the same final cause).

    HTH,
    Andre.

Children
No Data