GW User Move - wpcsout/ofs/6 processed very slowly

I am moving user mailboxes to a new PO (on a new server). I notice that on the destination PO, in the wpcsout/ofs/6 queue - there are about 170k messages waiting to be processed, and every 1-2 seconds they decrease only by 1-2 , and every now-and-then they increase by another 100-200. In the same time, the gwpoa is on near-100% CPU. 

I've checked disk performance on the destination machine - 226MB/s writes, 549MB/s reads - which looks OK.

The destination machine is a SLES12 VMware VM, with 16GB RAM AllReserved, and 4CPUs (gwpoa utilizes one of them on near-100%).

I tried to increase the MessageWorkerThreads - now they are 18, the C/S Handler Threads - now they are 80, and the MaxThreadUsageForPrimingAndMoves is now 60%.


Any ideas how to speed it up? I don't see any errors in the verbose log, users are not complaining about performance, but I am moving a few accounts since more than a week now...




  • On the source POA log, I do see some errors:

    "Could not '_NgwrepFixItem' (53288 0x0000d028): .......
    ** Error Replicating Message: D028 - 40

    No idea if this is related somehow to the slow processing of the destination queue...
  • This one generally still holds true


    Please note the possible delays involved (appendix).



  •  wrote:

    Any ideas how to speed it up?

    Before knowing how to resolve your issue, the cause has to be correctly diagnosed. That is likely to require some effort. In the mean time, all we can do is offer some general suggestions.

    Have a look at the SUSE Linux Enterprise Server 12 SP4 System Analysis and Tuning Guide. In particular, read the section 12.2 Available I/O Elevators. Some VMware users have seen a significant performance improvement when using the NOOP elevator.

    I assume your domain and post office are not on a BTRFS volume. That would be a bad idea.

    For best performance your domain and post office should reside on a dedicated volume connected via a dedicated storage adapter. 

    While you report read and write throughput, a better indication of how your IO subsystem is performing would be the maximum number of IOPS supported.

    Poor IO performance and low CPU utilization would be an indication of high IO latency. That does not appear to be your issue but you can check the IO latency for your VM via the VMware web client.

    You can gain a small CPU performance increase by configuring one processor with four cores rather than two processors each having two cores.

    You say (gwpoa utilizes one of them on near-100%). How do you determine that the gwpoa is using a specific core?

    Are you sure your VM isn't resource constrained? You will have to look at your host performance...

    IO contention is another possibility. Do your source and destination post offices reside in the same VMware datastore?

    Do your source and destination GroupWise virtual servers run on the same host? That could eliminate data transfers across the physical wire and provide better performance.

    Sometimes TSO can impact performance. Try enabling/disabling it: Enable or Disable TSO on a Linux Virtual Machine.

    ...and the list goes on!

  • Thank you for the reply! I've tried changing the elevator to NOOP (destination machines are virtual, sles12), but this didn't lead to any speedup (at least not noticeable). The user mailboxes that are in the process of moving from one PO to another PO are quite big (tens of GBs) and contain quite a lot of emails, so I guess this is the main reason there are huge numbers of queued messages. Currently, there are a lot of messages queued in the mslocal/..../2 queues of the MTAs involved. They are being processed (I can see this with ls -l in the "2" folder) and at the same time new ones are being put there. So I guess its a matter of time. One of the MTAs had 1.2 million messages in this queue yesterday, today when I checked - they were down to 800k and counting.

    Otherwise, performance of the VM host is quite OK. The source POs are on physical machines. Destination filesystem is XFS (source is NSS). Destination machines are on VMware, with VMXNET3 adapters, so TSO should be enabled by default.


    Doing a GroupWise migration across the wire using dbcopy can take days just to copy the data but you have to process the data at the destination. I can certainly understand why it can be slow.

    If I understand you correctly, your source server is a physical machine and the destination server is a SLES12 VMware VM.

    Let's assume for the moment that your destination VM is in good shape. The first step in improving performance is identifying exactly where the bottleneck is.

    What version of GroupWise are you running?

    Your source server holds the primary domain while your destination server holds a secondary domain, MTA, and the destination post office, is that correct?

    Are you only moving a few users at a time?

     referred you to the User Move Within GroupWise document. It contains a lot of good information. Have you gone through it? Perhaps Mathias has some additional ideas?

    Your issue certainly isn't black and white so I'd like to invite Massimo  and Andy  to join the discussion. Both these Knowledge Partners have extensive GroupWise experience and may have additional insight.

  •  wrote:

    Destination machines are on VMware, with VMXNET3 adapters, so TSO should be enabled by default.

    You may want to read this:

    Poor TCP performance might occur in Linux virtual machines with LRO enabled (1027511)

  •  wrote:
    On the source POA log, I do see some errors:

    "Could not '_NgwrepFixItem' (53288 0x0000d028): .......
    ** Error Replicating Message: D028 - 40

    as per

    D028   Lost attachment

    Source:  GroupWise engine.

    Explanation:  Attachment could not be associated with a message.

    Possible Cause:  The attachment pointer was damaged.


    a GWcheck contents check would have likely sorted these out in advance, at this point I don't think there is anything to do about these or that they are a part of the slowness given the messages have already made it over to the destination.  (which is why you want it run it regularly as I outline in )  shows that the messages have made it to the POA but are working their way into the users' mailboxes  (wish I could find newer versions of those diagrams, but since that flow hasn't really changed, these old ones work.)

    Are you doing these just one user at a time or hitting with many?  more is not merrier in this case

    What version(s) of GroupWise are you using? How many users are there already on this system?

    how much of swap is being used (free -m) because if all being used is an indication of needing more ram.

    have you checked the destination POA web console to monitor the different thread use (any maxing out) and what the quick indexer is up to?   http://poa_server_address:7181
    You will want to keep on top of your Quick Indexer as it needs to be built back up on this new POA as the messages are moved in.


  • Did you run gwchecks before starting the move operations? Often moves unveil issues (such as lost attachments) which are around for a long time but haven't been noticed by anyone as the items in question haven't been touched for years. Your indexes should be up-to-date beforehand, i've seen instances where they've been broken 2 years ago and 300000 items behind.

    How is the target VMDK provisioned? For GW operations such as migrations or moves the way blocks are allocated on anything but thick-eager formatted VMDKs can slow things down significantly. And even if you've provisioned thick-eager you could still have a storages which virtualizes too much so that allocations actually behave like on thin disks.


  • Hi,

    Here's a quick video showing a good way to test the performance of the disk system holding your GroupWise POA:




  • Thanks to everyone for the replies! Unfortunately the moves eventually completed, so I couldn't identify the main reason(s) for the performance. I couldn't try the LRO settings, but I suppose the slow performance was due to a combination of the following:

    - very large mailboxes with many (hundreds of thousands) messages

    - thin-provisioned destination disks (although performance tests were showing excellent results...)

    - sporadic dying of the agents, which I found out to be due to the Hardware Lock Elision issue

    - something else that was not identified


    Well, since all moves completed, the system is now tip-top.