Proactively testing your GroupWise systems IO performance
We've all taken the call, you know, the one from your CxO that GroupWise is slow, the old haystack and the needle call. Looking through the logs on the POA yields nothing, testing with a test account on the CxO's post office doesn't show the symptom that's been reported - yet you're taking calls at the helpdesk that things are slow. CPU looks good, no backup running, network team says everything looks good... there's no obvious cause, now what?!?
IO performance testing
We all want to avoid "the call". Performing some simple tests and keeping the results from when "things are good", can go a long way towards identifying whether or not disk IO performance is contributing to slowness. Here are some simple tests that should identify what sustained disk IO you can expect from your system.
Note: dd can be destructive, carefully review your command before executing it.
Sync, eg., force everything to be written to disk that's currently cached, copy in blocks of 1024k, a string of zeros, do this copy 10000 times to a file named /media/nss/GW/testfile, the second command then copies the testfile to a device named /dev/null (bitbucket in the sky).
What you should see:
If you have decent hardware and it's not a busy time for disk access, you should see throughput numbers that exceed 200MBPS on the sustained write test and 300MBPS on the read test. Re-run the test a few times then average the returned values to get a better picture of what the throughput numbers are when things are good. If you don't see at least these numbers, you'll want to closely watch any POA hosted on this server for signs of degradation in service delivery, or chose a different server to host your POA.
Sample results: smoring@slowpoke:~/Desktop> sync;time -p dd if=/dev/zero of=/media/nss/GW/testfile bs=1024k count=10000 10000 0 records in 10000 0 records out 10485760000 bytes (10 GB) copied, 422.616 s, 24.8 MB/s real 422.62 user 0.02 sys 13.64
As you can see from this example, the disk throughput during the sustained write test is less than 200MBPS. A well performing GroupWise system requires enterprise class disk throughput, without it, you'll be taking evasive action when the CxO comes looking for you...
I tried running the read and write tests on a SLES 11 SP1 x64 server running as a vSphere 4.0 VM connected to an EMC SAN and got pretty bad numbers. On average after 3 runs I was seeing 92.4 MB/s for writes and 71 MB/s for reads! Why would reads be slower than writes?
Also, do you know of an equivalent way to test a Windows GW server so I could compare performance of Windows to Linux?