"IPC Read Error System error: [10053] -Join me in the quest to slay the beast.

The creature has nasty, big, pointy teeth...

 

I get the following error when trying to back up my Imail Server's D: drive where the email storage is kept:

[Critical] From: BDA-NET@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:30:09 AM

IPC failure reading NET message (IPC Read Error System error: [10053] Software caused connection abort) => aborting.

 

Software/Hardware:

Data Protector: 8.13

Source/Disk Agent - Virtual: VMWare vSphere 5.5, OS: Windows Server 2012R2 Standard, network: DMZ, Storage: 3PAR

Cell Server - OS: Centos 6, network:LAN, Storage: Local

B2D Target/MediaAgent - OS: CentOS 6, network: LAN, Storage: Local

Firewall - Cisco ASA 5510 running ASDM 7.3

 

Backup Specs:

Did have Reconnect broken connections - this flooded my logs with "trying to reconnect" till it timed out. Turning it off gives me the error above.

Enhanced incremental Backup, Use native Filesystem Change Log Provider if available (D: drive set to Max 4gb Delta 2GB with omnicjutil), Display Statistical info, Open Files - # of retry 3, time out 10, MS VSS - Use Shadow Copy, Allow Fallback

 

Backstory:

The email server running Imail orginigally was a VM running an older version of Imail and Server 2008 - no issues backing up.  A little while ago a new VM was setup and 

 

Data not being backed up:

@ 90 GB of mailbox files being stored on a separate virtual disk (D:) attached to the VM running Imail Server.  Changing around the omnirc settings gets me a difference between 1% to 5% of the backup before I get the error.

 

Possible Clue:

I can backup filesystem data off the Email Server's C\: drive without issue.  

 

 

omnirc settings:

Source: 

OB2PORTRANGE=5555-5650
OB2IPCKEEPALIVE=1
OB2IPCKEEPALIVETIME=600
OB2IPCKEEPALIVEINTERVAL=60
OB2SHMIPC=0
OB2PORTRANGESPEC=5555-5650
OB2INETTIMEOUT=120

 

Target B2D and Cell Server are the same but keepalivetime/intv settings set to this Support Tip.

 

What's been done so far:

  • Expanded D and C drive on the Email server to give higher percentage of Free Space and defragged them.  - dropped the average Disk Queue Length from .92-1.1 to 0.1 for the D drive.
  • Started with no Omnirc settings on Email Server, the 2 CentOs boxes had PortRange/KeepAlive/OB2SHMIPC, added and removed different variations.  
  • changed timings on the VM nic
  • check/rechecked the Firewall - no denys are coming up in the logs.
  • stopped/restarted the Change Journal and upsized it for the D drive.
  • run Debug sessions, setup and run Wireshark on the Target and Cell Server (a little leery to run it on the production email server.

 

I thank you in advance for any thoughts/suggestions.  I'm submitting a ticket for this issue.  I'll repost the solution if it comes from HP.  

 

Thanks,

Tim

 

 

 

 

 

Additional Info:  The Job's Session Message:

[Normal] From: BSM@CELL.MANAGER "Email_Daily" Time: 9/29/2015 8:28:10 AM
Backup session 2015/09/29-12 started.

 

[Normal] From: BMA@MEDIA.AGENT-B2D.STORAGE "DP1_DMF_Writer0" Time: 9/29/2015 8:28:12 AM
STARTING Media Agent "DP1_DMF_Writer0"

 

[Normal] From: BMA@MEDIA.AGENT-B2D.STORAGE "DP1_DMF_Writer0" Time: 9/29/2015 8:28:12 AM
Deleting expired file depots from file library "DP1_DMF".

 

[Normal] From: BMA@MEDIA.AGENT-B2D.STORAGE "DP1_DMF_Writer0" Time: 9/29/2015 8:28:13 AM
Loading medium from slot /dpdisk/dp01dmf/2b00eb0a0560a83dd056240019f.fd to device DP1_DMF_Writer0

 

[Normal] From: BMA@MEDIA.AGENT-B2D.STORAGE "DP1_DMF_Writer0" Time: 9/29/2015 8:28:13 AM
/dpdisk/dp01dmf/2b00eb0a0560a83dd056240019f.fd
Initializing new medium: "DP1_DMF_MediaPool_38722"

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:28:15 AM
STARTING Disk Agent for EMAIL.SERVER:/D "D: [MAIL]".

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:28:15 AM
The Change Log Provider has been activated and will be used by the next incremental backup.

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:28:16 AM
VSS option was specified. Attempting to create snapshot.

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:28:16 AM
Entering snapshot definition phase.

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:28:16 AM
Volume 'D:\' successfully added to snapshot set. The volume is now locked.

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:28:16 AM
Exiting snapshot definition phase.

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:28:17 AM
Creation of snapshot volume for D:\ succeeded. Proceeding with backup.

 

[Critical] From: BDA-NET@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:30:09 AM
IPC failure reading NET message (IPC Read Error
System error: [10053] Software caused connection abort
) => aborting.

 

[Critical] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:30:09 AM
Unexpected close reading NET message => aborting.

 

[Critical] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:30:09 AM
Connection to Media Agent broken => aborting.

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:30:09 AM
Filesystem Statistics:

Directories ........ 13
Regular files ...... 42
------------------------------
Objects Total ...... 55
Total Size ......... 1.68 GB

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:30:09 AM
Backup completed. Disconnecting from Volume Shadow Copy Service.

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:30:09 AM
Volume Shadow Copy successfully disconnected. Releasing the volume.

 

[Normal] From: VBDA@EMAIL.SERVER "D: [MAIL]" Time: 9/29/2015 8:30:09 AM
ABORTED Disk Agent for EMAIL.SERVER:/D "D: [MAIL]".

 

[Normal] From: BMA@MEDIA.AGENT-B2D.STORAGE "DP1_DMF_Writer0" Time: 9/29/2015 8:30:09 AM
Unloading medium to slot /dpdisk/dp01dmf/2b00eb0a0560a83dd056240019f.fd from device DP1_DMF_Writer0

 

[Normal] From: BMA@MEDIA.AGENT-B2D.STORAGE "DP1_DMF_Writer0" Time: 9/29/2015 8:30:14 AM
COMPLETED Media Agent "DP1_DMF_Writer0"

 

[Critical] From: BSM@CELL.MANAGER "Email_Daily" Time: 9/29/2015 8:30:15 AM
None of the Disk Agents completed successfully.
Session has failed.

 

[Normal] From: BSM@CELL.MANAGER "Email_Daily" Time: 9/29/2015 8:30:15 AM

Backup Statistics:

Session Queuing Time (hours) 0.00
-------------------------------------------
Completed Disk Agents ........ 0
Failed Disk Agents ........... 1
Aborted Disk Agents .......... 0
-------------------------------------------
Disk Agents Total ........... 1
===========================================
Completed Media Agents ....... 1
Failed Media Agents .......... 0
Aborted Media Agents ......... 0
-------------------------------------------
Media Agents Total .......... 1
===========================================
Mbytes Total ................. 1720 MB
Used Media Total ............. 1
Disk Agent Errors Total ...... 2

Tags:

Parents
  • First thought about Firewall-Timeout, but since the Job faults after just a couple of minutes this shouldn't be it. OB2SHMIPC should be removed, tends to do more harm then good in new releases. How many Files are on the D: Drive? Possible the Change-Info exceeds the limits of ChangeLogProvider and EnhancedIncr DB and cause the Job to fault. Suggest to set up a Job just saving Disk D: without VSS or Enhanced Incr. If that works without issues enable VSS fallback again and retry. If the second Job fails check the VSS Settings, maybe the Snapshot-Area is restricted to a insufficient amount of space (our OS-Guys restrict it to 1GB on many servers, causing faults). PS: like the amount of Details you included, appreaciate it.

  • wrote:
    First thought about Firewall-Timeout, but since the Job faults after just a couple of minutes this shouldn't be it. OB2SHMIPC should be removed, tends to do more harm then good in new releases. How many Files are on the D: Drive? Possible the Change-Info exceeds the limits of ChangeLogProvider and EnhancedIncr DB and cause the Job to fault. Suggest to set up a Job just saving Disk D: without VSS or Enhanced Incr. If that works without issues enable VSS fallback again and retry. If the second Job fails check the VSS Settings, maybe the Snapshot-Area is restricted to a insufficient amount of space (our OS-Guys restrict it to 1GB on many servers, causing faults). PS: like the amount of Details you included, appreaciate it.

    Thanks for the suggestion svollrat.

     

    I had turned off the OB2SHMIPC yesterday on all the machines and have been getting consistently 1% of the 90GB backup.  I did increase the drive size from 110GB to 140GB as the drive was about 91% full and is now 71% full.

     

    My coworker set the VSS to unlimited on the drive yesterday trying to get it to work with no change to the 1% before failure.  I went and dropped the VSS allowed space to 320GB and then set it at 10% of drive to try and clear any possible cache issue according to this site.  Running the job again got me to 6% before it kicked the same error. No clear indicator in Events log.

     

    Turning off the VSS and EnhancedIncr . It got to 5% before it crashed.

     

    In the process of gathering requested data for the HP Support for my open Case.  

  • Holy Cow, looks pretty ugly. Nothing in the Eventlog at all? No Filesystem Errors, no faulting Binary? have you tried taking Network out of the equation by runnig vbda locally (f.E. vbda.exe -vol D:\ -profile -out NUL)? How far does this come?

  • wrote:
    Holy Cow, looks pretty ugly. Nothing in the Eventlog at all? No Filesystem Errors, no faulting Binary? have you tried taking Network out of the equation by runnig vbda locally (f.E. vbda.exe -vol D:\ -profile -out NUL)? How far does this come?

    Event Log has been worthless. Errors only pertain to SMBWitnessClient that I left a share open to.  Closed now.

     

    vbda.exe -vol D:\ -profile -out

     

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:33:09 PM
    STARTING Disk Agent for SOURCE.EMAIL.SERVER:D:\ "".

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    Backup Profile:

    Run Time ........... 0:14:51
    Backup Speed ....... 106.78 MB/s

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    Filesystem Statistics:

    Directories ........ 554
    Regular files ...... 9915
    ------------------------------
    Objects Total ...... 10469
    Total Size ......... 92.91 GB

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    COMPLETED Disk Agent for SOURCE.EMAIL.SERVER:D:\ "".

    time: 891 (sec), tot: 97424566 (kB), done: 100 (%)

     

    @In looking at all my Session logs, I found that the CONFIGURATION (@2.5GB) backups from this Source Server have been succeeding.  I had tried a backup of a 10GB dummy file (FSUTIL created) from the desktop of the admin user with no issue.  I'd been thinking the C: drive was fine, but I went ahead and tried to back up \Users, \Program Files, \Program Files (x86) @17GB and the backup failed.  

     

    It really looks like the issue occurs when the 2nd Initialization of medium on the B2D's dmf libary is begun.  Every job on that VM seems to fail at that point.

     

    Got all the requested info out and am waiting to hear back from Support now.   


  • wrote:
    Holy Cow, looks pretty ugly. Nothing in the Eventlog at all? No Filesystem Errors, no faulting Binary? have you tried taking Network out of the equation by runnig vbda locally (f.E. vbda.exe -vol D:\ -profile -out NUL)? How far does this come?

    Event Log has been worthless. Errors only pertain to SMBWitnessClient that I left a share open to.  Closed now.

     

    vbda.exe -vol D:\ -profile -out

     

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:33:09 PM
    STARTING Disk Agent for SOURCE.EMAIL.SERVER:D:\ "".

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    Backup Profile:

    Run Time ........... 0:14:51
    Backup Speed ....... 106.78 MB/s

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    Filesystem Statistics:

    Directories ........ 554
    Regular files ...... 9915
    ------------------------------
    Objects Total ...... 10469
    Total Size ......... 92.91 GB

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    COMPLETED Disk Agent for SOURCE.EMAIL.SERVER:D:\ "".

    time: 891 (sec), tot: 97424566 (kB), done: 100 (%)

     

    @In looking at all my Session logs, I found that the CONFIGURATION (@2.5GB) backups from this Source Server have been succeeding.  I had tried a backup of a 10GB dummy file (FSUTIL created) from the desktop of the admin user with no issue.  I'd been thinking the C: drive was fine, but I went ahead and tried to back up \Users, \Program Files, \Program Files (x86) @17GB and the backup failed.  

     

    It really looks like the issue occurs when the 2nd Initialization of medium on the B2D's dmf libary is begun.  Every job on that VM seems to fail at that point.

     

    Got all the requested info out and am waiting to hear back from Support now.   

Reply

  • wrote:
    Holy Cow, looks pretty ugly. Nothing in the Eventlog at all? No Filesystem Errors, no faulting Binary? have you tried taking Network out of the equation by runnig vbda locally (f.E. vbda.exe -vol D:\ -profile -out NUL)? How far does this come?

    Event Log has been worthless. Errors only pertain to SMBWitnessClient that I left a share open to.  Closed now.

     

    vbda.exe -vol D:\ -profile -out

     

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:33:09 PM
    STARTING Disk Agent for SOURCE.EMAIL.SERVER:D:\ "".

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    Backup Profile:

    Run Time ........... 0:14:51
    Backup Speed ....... 106.78 MB/s

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    Filesystem Statistics:

    Directories ........ 554
    Regular files ...... 9915
    ------------------------------
    Objects Total ...... 10469
    Total Size ......... 92.91 GB

    [Normal] From: VBDA@SOURCE.EMAIL.SERVER "" Time: 9/30/2015 12:48:00 PM
    COMPLETED Disk Agent for SOURCE.EMAIL.SERVER:D:\ "".

    time: 891 (sec), tot: 97424566 (kB), done: 100 (%)

     

    @In looking at all my Session logs, I found that the CONFIGURATION (@2.5GB) backups from this Source Server have been succeeding.  I had tried a backup of a 10GB dummy file (FSUTIL created) from the desktop of the admin user with no issue.  I'd been thinking the C: drive was fine, but I went ahead and tried to back up \Users, \Program Files, \Program Files (x86) @17GB and the backup failed.  

     

    It really looks like the issue occurs when the 2nd Initialization of medium on the B2D's dmf libary is begun.  Every job on that VM seems to fail at that point.

     

    Got all the requested info out and am waiting to hear back from Support now.   

Children
  • Local Backup with manual running vbda seems to be fine, so the issue is not this Binary or the Filesystem itself. You can try adding "-vss" to check if this will cause trouble. Please try a Backup to Null-Device on your MediaServer next to verify if the Network is stable. If that also works fine I'd recommend checking your Backup-Device. By the way, have you considered using the VE-Agent to save the VM instead of using DiskAgent? With GRE and SmartCache this might be the better solution.