Default POA Maintenance - Client User Latency High

Our GroupWise system runs the default POA Analyze/Fix weekly. This gwcheck routine takes roughly 16 hours to complete. About 9 hours into this gwcheck maintenance, the system performance is degraded to a point where our groupwise clients become unresponsive.

We ran this same default POA Anayze/Fix maintenance routine when the GroupWise system was running Version 2012, without any client degradation.

If we look back in our analytics, we are able to determine that this high cpu / disk io was introduced at the point that we upgraded the system from 2012 to 2014.

None of the physical server properties changed when we upgraded to 2014.

Our GroupWise Server is an OES11 SP2 VM running on an IBM Blade Server. The GW VM has 8Gig of RAM and 2 Procs. The Post Office and Domain are both serviced on this server using NSS SAN attached Volumes.

The Analyze / Fix has the following Actions checked: Structure/Index, Content, Fix Problems. The database items checked are: User/Resources, Messages, Document.

Is the 2014 maintenance process doing something different than it was with 2012 which would cause this high cpu and disk io? Are there parameters that could be modified in order to get utilization under control during this maintenance check? Should I be looking at adding more cpu power or memory?

Tags:

  • Hi,

    Whilst I don't know why your maintenance task is "killing" your server, I would like to make recommendations to adjust the default maintenance task based on my experience.

    Firstly, do not run both a Structure and a Content check simultaneously. Doing so is asking for trouble!

    My maintenance routines are set as follows:

    Daily, 11:00pm, Structure check against all databases.
    Weekly, Saturday 2:00am, Content check against all databases.

    I never recommend maintenance running during work hours, even on previous versions of GroupWise, the system will always take a "hit"

    Perhaps telling us how many users and how many GB's/TB's your POA is will give us a few more clues as to why it is taking so long.

    You can adjust how many threads are dedicated to the GWCheck process but I have never had to adjust these ever in my entire career with GroupWise.

    Cheers,
  • We have 280 users and a 400G Post Office.

    We have two Libraries defined for document storage which looks to be about 500Meg in total size combined.

    The maintenance starts at 11:30 PM Saturday, and around 8:30 AM Sunday we can see that the disk io and cpu gets intense. The maintenance completes around 3:30 PM Sunday. The math doesn't work based on start and finish times, but the poacheck log states that the total job process time is 16 hours.

    There are no backups running Saturday or Sunday.

    We are an ISP / Telecommunications company, so we have a full staff in our Network Operations Center and Help desk 24X7. Although the scheduled maintenance affects the least amount of our user base, it affects the 2 departments most dependent on GroupWise.

    I thought about increasing the gwcheck schedule queue, which is at the default setting of 4, but I fear that may have a negative impact rather than positive.

    Can you be more specific on exactly what you recommend to be checked marked for each scheduled event. (Ex: Fix Problems, Index Check, ect...)

    Is there a document reference that clarifies what should be selected, the reasons for, and dangers of?

    Thanks for your help and suggestions Laura!
  • Hi,

    With regards to the log file reporting process time, there are two entries at the bottom of the log file. One states the physical time that the routine took in actual minutes, and the other one states a "virtual" time calculated which, according to my logs is about 4 x the actual minutes. The second one is calculated based on the time each thread was processing the request and adds all that time together giving an "inflated" time to complete the job. Consider this "virtual" time. It's the first one that you are interested in.

    One of my customers has a system roughly the same size as yours (257 users, about 350 - 400 GB of data), so I'll give you time estimates based on what is running on their site.

    Daily Structure check takes 10 minutes
    Weekly Content check takes 3 - 4 hours

    Before I took over the above mentioned site the maintenance routines had not been set to my preferences (if that is what you want to call them), the first Content check that I ran for them took more than 10 hours to complete and the box ground to an absolute halt.

    For each check mentioned above I check all available boxes based on what type of check I'm running e.g. Fix Problems on both, Update Disk Totals on Content, etc.

    I would recommend that you start by separating the Structure and Content check as per my recommendations above. Seeing as it's Friday today, perhaps you want to make these adjustments and see how your weekend goes?

    I do not have a specific TID or document reference on my recommendations, though I will look to see if I can find anything for you.

    Hope that this helps.

    Cheers,
  • Hi,

    I've found a reference to preferred scheduled maintenance on a GroupWise system. The WiKi is for GroupWise 8 but the same principles have applied since GroupWise 4 days.

    https://wiki.microfocus.com/index.php?title=GroupWise_Maintenance

    Scroll down to the section titled Scheduled Maintenance, and points 1
  • I split the Structure and Content jobs as suggested.

    The Daily Structure check takes roughly 40 Minutes to complete.

    The Weekly Content check started Saturday evening at 11:30PM and completed Sunday evening at 8:00PM. (20 hours).

    It appeared that 1 file was stuck in the gwcheck queue until 1:30PM Sunday and then the gwcheck queue moved past that file and took off processing the message files, showing 600 files in the queue.

    I suspect had the process not been stuck on that 1 file, the Content process would have taken roughly 7-8 hours.

    Now if I can just make sense of the results:

    CODE DESCRIPTION COUNT
    ---- -------------------------------------------------- -----
    8 Errors reading user databases...................... 1
    50 Orphaned Blob files (deleted)...................... 8
    83 Items that have failed to archive.................. 1
    Correctable conditions encountered:
    CODE DESCRIPTION COUNT
    ---- -------------------------------------------------- -----
    19 MSG host recs, LIN_DRN pointing to missing record.. 247
    30 Folder records found with improper parents......... 1
    39 Unrecognized or invalid files in mail directories.. 5
    66 GWCheck log files in log directory................. 2
    67 Outdated execution records.(notify/alarm).......... 16003
    78 Item records with invalid fields................... 38
    79 Duplicate folder names at same level............... 126
    82 Inaccessible attachment files...................... 1
    91 Databases in Store Catalog but not on disk.......... 21
    93 Unused blob files (deleted)........................ 119
    102 Contact folders out of sync........................ 7

    Any suggestions?
  • Hi

    If possible I would like to take a look at your log file. I've sent you a Private Message with more details.

    Thanks in advance.

    Cheers,
  • In article <dkerbaugh.75diy0@no-mx.forums.microfocus.com>, Dkerbaugh
    wrote:
    > 8 Errors reading user databases...................... 1
    > 50 Orphaned Blob files (deleted)...................... 8
    > 83 Items that have failed to archive.................. 1


    These are the ones that need to be fixed and will take some manual
    intervention.
    - Error 8 is the most problematic. Search the log for "Error 8 " to
    find who is affected. See if that user can get to their account and if
    that file is still a problem to read. Likely will have to do a few
    GWChecks just against that user to sort this out. Do you have any
    antivirus on this system? Or possibly have a client mapped to it that
    is scanning/locking files?
    - Error 83 likely relates to someone's archive settings being invalid,
    so searching for "Error 83" will point you to that user so you can
    verify their archive settings.
    - Error 50 sometimes clears them selves. Just compare each instance
    with the next Contents GWCheck to make sure they clear, otherwise dig
    more.

    From the next set of Problems, search for them by "Problem 30" to look
    at the individual instances. Problems 79, 91, and 102 are ones that
    need attention 1st and have known fixes
    For more on this process, I've written it up at
    http://www.konecnyad.ca/andyk/gwlogs.htm
    along with other useful GW Maintenance bits that are worth your
    checking out.



    Andy of
    http://KonecnyConsulting.ca in Toronto
    Knowledge Partner
    http://forums.novell.com/member.php/75037-konecnya
    If you find a post helpful and are logged in the Web interface, please
    show your appreciation by clicking on the star below. Thanks!