while monitoring my SOS housekeeping activity on 9.04 (running on W2k8R2), I noticed that, starting after a while of normal operations, HK suddenly doesn't free disk space anymore. Instead, during HK, the disk usage grows by significant amounts (hundreds of GB). At the same time, StoreOnceSoftware --list_stores tells me it freed some 13GB (Store Data Size is dropping from 4121GB to 4108GB). I first tried to explain the issue away by metadata increase, but observations made after that tell a different story:
- After a fresh start of SOS, HK works as expected. As long as there is anything to be cleaned, disk usage usually decreases.
- The issue starts after days or even weeks of normal operation. All of a sudden, HK will increase disk usage on every run. I don't yet know if the trigger is just time, or time in which nothing would be cleaned for several days (say, after a change in protection times which makes omnimm --delete_unprotected_media come up empty for days in a row).
- The issue doesn't resolve by itself. Even weeks after it started, it will still only fill up the disk more, never free up anything.
- There is apparently junk piling up in the recycled and retired directories in the store. But be warned: Do never attempt to watch these directories during HK operation. I managed to break the store by just watching one of these directories using Explorer on debugging this issue. It apparently locked the directory, so HK couldn't move some data and fell on its face. Store failed. Ridiculous. Had to hunt for s.bad_integrity and all that. But it finally forced me to restart the SOS process (by actually rebooting), and thus gave me the next insight:
- The whole mess cleans itself up when SOS is restarted. Nice to see several TB of space getting finally freed, mostly by cleaning up the recycled directory somewhere in the store structure.
- There is, however, a glitch: Stopping the SOS service the regular way never succeeds. Windows will just display a progress bar for a while, finally telling me the process didn't react to the STOP signal in due time. And indeed, it will linger around, as far as I tested, forever (I waited 2h). The only way to progress here is killing the process from Task Manager or likewise. I don't like that at all, knowing how intricate and brittle SOS uses to be. Fear of losing all my >100TB user data in that store runs deep.
- Sadly, while I can --stop_store and --start_store just the (single) store in question, doing that will not trigger the cleanup.
Anyone else seeing this? Workarounds? Is there a known fix? Maybe even in 9.05?