During BrainShare 2008, the most common question that the GroupWise team and the GroupWise SysOps were asked was, "What file system should I use for Linux?" This article explains the choices and why I reach the conclusion that I do ...
Solution
There are 4 (or 5) choices of file system for GroupWise 7 on SLES10 or OES2:
Let's look at these in turn and then explain their advantages and disadvantages.
XFS
XFS is extremely fast, but it uses some very aggressive caching to achieve the throughput. It's questionable in its management. For example, try formatting a USB stick with XFS, then copy a large file to the USB stick. When the copy has finished, umount the drive. When you next plug the USB stick in, is the file readable? (I have failed to have the copy complete in 10 test attempts.) This makes it totally unsuitable for a cluster, and probably too vulnerable for a standalone GroupWise 7 Server. Discounted.
EXT3
EXT3 is slow without the H-Tree and so is discounted.
With H-Tree, EXT3 becomes a very strong performer. However, there is a price to pay for the increased performanace. GroupWise uses telldir(), seekdir(), and readdir(), in the calling of files, all of which return a cookie that is nominally the position in the file. It's unfortunately based an assumption that was true when the interface was designed but is not currently true. The assumption is that directories will always be laid out linearly, so a file offset will be enough to identify the directory entry. Unfortunately, this offset is a signed int, where only positive values are valid. This translates to a cookie size of 31 bits in which to uniquely identify the position.
The problem arises with ext3 hashed directories, because they use a 64-bit hash that consists of a 32-bit major and a 32-bit minor hash. Ext3 returns the major hash, which is additionally truncated by one bit. This not only eliminates the collision handling that the kernel handles internally, but also creates even more collisions for telldir(). Discounted.
So, now we are down to a 2 horse race, ReiserFS Vs NSS.
NSS and ReiserFS
In the original OES, based on SLES9, the performance of NSS was severely lacking in comparison to ReiserFS, having only some 80% or less of the performance of Reiser. With the new SLES10-based OES2, that degradation in performance of NSS compared to ReiserFS appears to have been reduced to a more acceptable 5% or so – so long as the volume is created with Salvage disabled. For a long time, ReiserFS has had a reputation of being quick but fragile, and when the file system tree has to be rebuilt, you are more likely to get a few sticks and a pile of leaves than a whole tree. NSS rebuilds are remarkably complete (as one would expect from a file-server file system); therefore, NSS provides an excellent alternative to ReiserFS.
Looking at telldir() and ReiserFS, ReiserFS doesn't display the same problem because it uses a much smaller hash space. It has a 32-bit total, where bits 7-30 describe the hash, and bits 0-6 describe the generation number that handles collisions. Because the last bit is unused, ReiserFS doesn't run into problems with telldir(). The trade-off is that ReiserFS supports a small hash space with a maximum of 127 collisions, so it's much more prone to spuriously returning -EBUSY when the maximum number of collisions has been reached. I expect that XFS could have the same problem as EXT3 plus H-Tree, since it uses 64-bit offsets that end up getting truncated.
In conclusion there are really just 2 choices. The most performing (stable) system is ReiserFS, which should be used when every cycle is critical - but beware of overloading the system. The best compromise between speed and resilience is NSS with Salvage disabled.
Based on the above, I would recommend that the default file system to be used for a GroupWise system running on OES2 is NSS.
I hope this helps some folk with the sleepless nights.
The XFS filesystem does quite a lot in memory, thus the failure on the memory stick test. Still, it does a much better job of handling a large number of small files than ext3 and is developed by someone much more stable than reiser (in jail for murder). We have seen many situations where filesystems running GroupWise become corrupt and require lengthy recovery operations. The repairs can take hours. XFS is much more resilient and better suited to the task.
The XFS filesystem does quite a lot in memory, thus the failure on the memory stick test. Still, it does a much better job of handling a large number of small files than ext3 and is developed by someone much more stable than reiser (in jail for murder). We have seen many situations where filesystems running GroupWise become corrupt and require lengthy recovery operations. The repairs can take hours. XFS is much more resilient and better suited to the task.
Using the USB stick as an example of how aggressively XFS uses caching an memory explais to most people why this would be a bad file system in a cluster (as I said) and again, could (not will) cause issues on a stand alone server. When this is taken in conjunction with the probably telldir() issue, I believe there is a logical conclusion that as the is "data" that we are manipulation, we cannot take that risk. The nice thing about Linux is it provides the choice - so if you are happy with the risk, so be it.