UCMDB Support Tip: What are stuck threads?
There's a parameter on the Pattern/Adapter configuration that sets the "max execution time" for the Job that implements this Pattern/Adapter.
In case the job takes (from execution of all the commands and parsing/sending the results point of view) more time than what this parameter is set for (default is 15 minutes) as a result it will be considered "stuck" by the Probe itself.
There's also a parameter in DiscoveryProbe.properties that defines the max amount of simultaneous "stuck threads" that are needed for the Probe to decide that it should restart itself (default value is 8). Upon probe restart, all the processes and corresponding threads are killed (hence job execution is interrupted - not sure if it restarts immediately after the Probe startup or just on the next scheduled invocation).
Whenever the amount of stuck threads doesn't hit this boundary, Probe regularly checks if the job execution has finished and if it has, it marks the thread as "not stuck" and the counter decreases.
Stuck threads may happen because of two main reasons:
1. The job execution has stuck - because of "hang" client, error in the code that takes too much time for execution (e.g. infinite loop, etc.)
2. The job execution takes more time than it was expected because of network latency, destination being loaded and giving up the data for too long, etc.
One can investigate and see, what the problem is, by examining the Communication.Log of the discovery, and then proceed to the resolution (change Pattern/Adapter configuration, fixing the code or examining the destination to understand why it works that hard).
If you find this or any post resolves your issue, please be sure to mark it as an accepted solution."
Click the KUDOS star on the left to say 'Thanks'
Re: UCMDB Support Tip: What are stuck threads?
What about stuck everything? Rebooted the app server, even gave it more RAM, rebooted the probes, and stuck on the same handful of jobs for more than two hours. This is a regular occurrence. I'm talking basic host connection on WMI, some VMware connections, less than 300 total jobs, and stuck FOR HOURS. HAPPENS REGULARLY. YES, I'M FRUSTRATED. Too many logs to know which one is truly relevant.