DBCopy fails to copy all of offiles

Following our recent upgrade to GW12.0.2, our nightly DBCopy jobs aren't copying all of the contents of the offiles directories. We use DBcopy to copy our GW system to other OES11 SLES11.1 servers, where they are then backed up offline.

We are getting errors like the following in our dbcopy logfiles:

[272] Error: Failed to read next file in /root/F/PO06/offiles/fd15 rc=(8200) errno(4)
[272] Error: Error 8200 in performing CopyDirectory /root/F/PO06/offiles/fd15
or
* An error occured during copy rc(8200) errno(4) prev copied size (0) *
[096] Error: Return Code (8200) processing /root/F/PO06/offiles/fd13/3c4c410b.000
[096] Error: Failed to read next file in /root/F/PO06/offiles/fd13 rc=(8200) errno(4)
[096] Error: Error 8200 in performing CopyDirectory /root/F/PO06/offiles/fd13

If I then re-run the job, some more data is copied into offiles directory and there are less errors mentioned in the dbcopy logfile but on a relatively small PO, it took 10 runs for the whole directory to be copied over and for the size of offiles in the copied area to match that in live.

This has specifically happened since our upgrade from GW8.0.2 to GW12.0.2. Has anyone else come across this/come up with a resolution?

Thanks
  • 9753595 wrote:

    > This has specifically happened since our upgrade from GW8.0.2 to GW12.0.2. Has
    > anyone else come across this/come up with a resolution?


    So, I suspect this is bigger than GroupWise. The errors you are reporting
    really seem to be "file system" errors. 8200 is specifically a "file I/O"
    error. It could mean that there is corruption on the disk. What is the file
    system here. Is /root/F really where these files exist? Of is this a symlink?

    --
    Danita
    Novell Knowledge Partner
    GroupWise 2014 is just around the corner - will you be ready?
    http://bit.ly/cat-gw2014

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...
  • Hi Danita,

    Thanks for your reply.

    A quick overview of our system:

    Our live GW system runs entirely on vmware and in a Novell cluster (so NSS is the filesystem). We're running SLES11.1 OES11. We dbcopy over to another site (same infrastructure without the clustering).

    We use a script to run dbcopy as a scheduled task on each destination server, nwmap is used to map F (in this instance) for the PO source in the dbcopy command. So yes, /root/F is the correct source of the files, and a lot of files (including all of ofuser
  • 9753595;2294335 wrote:

    It 100% appears to be an issue since we've upgraded to GW12.0.2 as the previous night's dbcopy runs were all successful with no errors, but once we'd upgraded, we see the errors appear in all our main POs (7 of them).


    Did you update the version of dbcopy when you moved to 2012?

    gwuber:~ # rpm -qa | grep groupwise

    novell-groupwise-admin-12.0.2-108211
    novell-groupwise-dbcopy-12.0.2-108211 <==
    novell-groupwise-gwha-12.0.2-108211
    novell-groupwise-gwdva-12.0.2-108211
    novell-groupwise-gwia-12.0.2-108211
    novell-groupwise-agents-12.0.2-108211
    novell-groupwise-gwcheck-12.0.2-108211
    novell-groupwise-webaccess-12.0.2-108211

    Do you, by accident, have multiple dbcopies, gwchecks, or a TSA / SMS backup running at the same time?

    Please look at TID 7014010 and see if that is helpful - it regards superfluous files in the mta and poa home folders. But I don't think this is your issue.

    And of course, how are you invoking dbcopy? Please give us your command line. We might see something.

    Regarding suspected I/O errors.... anything like a real I/O error, like disk damage, SAN spasms, etc., would show up when you look at the kernel ring buffer logs using dmesg and likely lost of other places like /var/log/messages. This is also NSS, so you would see NSS pitching a fit. Also look for the 8200 errors in the POA logs - are you seeing them for these same files when the dbcopy is NOT running, during a GWCHECK?

    Note also that offiles is very dynamic, especially when purges are occuring, or when gwcheck is grooming the blobs. So it may be perfectly normal for blobs to have been deleted during the time it took to enumerate and copy the rest of them. You need to quantify the number of files not being copied. If these were databases ( ofmsg / ofuser ) I would be VERY concerned. But offiles is dynamic enough that the file its trying to read has gone away by the time you are reading it. So the quantity of the files and which ones is critical to define.

    -- Bob
  • Bob-O-Rama wrote:

    > You need to quantify the number of files not being copied. If these were
    > databases ( ofmsg / ofuser ) I would be VERY concerned. But offiles is
    > dynamic enough that the file its trying to read has gone away by the time you
    > are reading it. So the quantity of the files and which ones is critical to
    > define.


    This is true too - I did a migration this weekend where I had a handful of these
    errors, and I actually went and looked for the files in question, and indeed
    they were not in the source. So this is possible, especially if the dbcopy is
    being run during hours when users are typically active (which these days is just
    about always <g>. When I ran my FINAL dbcopy with the POA down, I saw no errors
    on the copy.

    --
    Danita
    Novell Knowledge Partner
    GroupWise 2014 is just around the corner - will you be ready?
    http://bit.ly/cat-gw2014

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...
  • Hi Bob,

    Thanks for your reply.

    Novell's latest suggestion was to try reverting to the old 8.0.2 dbcopy that we previously used as we're not running any agents on the destination servers (we use them for potential GW DR and for offline backup of GW)

    I'm not entirely comfortable with this, but tried it on our test system which didn't appear to cause any issues, so I've now tried it on the smallish PO and it's copied everything over with no errors as pre upgrade.

    I was just wondering whether you know of any reason we might be concerned about running an old dbcopy against a newer live system??

    Thanks
    Mark.
  • Hi Danita,

    I'm aware of the transient nature of the contents of offiles, but what's been happening for example is that 18 of some 2000 have been getting copied over on the 1st run of dbcopy, then this has increased on each subsequent run. I've been running du -h /path to offiles directory to get a directory listing off both live and the copy to compare.

    Thanks for your input on this, I've replied to Bob above, but thought I'd do the same to you in case you otherwise missed my reply, this is the current situation:

    Novell's latest suggestion was to try reverting to the old 8.0.2 dbcopy that we previously used as we're not running any agents on the destination servers (we use them for potential GW DR and for offline backup of GW)

    I'm not entirely comfortable with this, but tried it on our test system which didn't appear to cause any issues, so I've now tried it on the smallish PO and it's copied everything over with no errors as pre upgrade.

    I was just wondering whether you know of any reason we might be concerned about running an old dbcopy against a newer live system??

    Thanks
    Mark.
  • 9753595 wrote:

    > I'm not entirely comfortable with this


    No problem at all. There is no reason why an older version can't be used in
    this case.

    --
    Danita
    Novell Knowledge Partner
    GroupWise 2014 is just around the corner - will you be ready?
    http://bit.ly/cat-gw2014

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...
  • Great, thanks Danita and thanks again for your input on this.