I'm seeing an issue with GMS 24.2 on heavy utilized servers. This was not an issue with GMS 18.5 which was the previous version before upgrading. Servers have ~500 users on them.
Background: I've learned to watch the "Idle Threads" based on this TID: https://support.microfocus.com/kb/doc.php?id=7014654. Under normal operation the logs report Idle Threads about once per second. . I have figured out that as long as that is happening (regardless of how many threads are idle), the system is generally working.
After 5-10 days (unknown / random) of uptime, I receive reports that people are unable to sync their devices and send/receive email. When I check the logs for Idle Threads, I can see that the logs are no longer reporting Idle threads at all. It just stops reporting any Idle Threads shortly before the outage is discovered. In short, when the Idle Threads stop, I'm having an outage.
Has anybody else seen any behavior like this? I've tried to find any type of event that might shed some light on the situation but the logs are a nightmare to go through and so far I am stuck.
Restarting GMS always fixes it right away. It doesn't seem immediately to be a memory issue that I can tell. Also I have 54GB of RAM and 12 CPUs on one of the servers.. This makes the performance okay when it is running but it does not stop the condition from happening. The server went 10 days before the current outage.
Thanks for any ideas!