What is the most ideal way to handle stuck threads in a discovery job ?

Hello everyone,

 

uCMDB version: 9.05.CUP12.351

DDM Content Pack 11.08.802

OS: RHEL

 

This is a generic question about probes and their capabilities.

While running heavy jobs like Host Networking by SNMP or VLAN related jobs, I check the probe jmx (Local_probeFQDN --> type=JobsInformation -->Description of viewJobsStatuses),  for stuck threads, run duration, etc. Often, I see that all of the 8 threads(Default) are stuck and remain stuck for hours together.

 

When I play with the Discoveryprobe.properties file and change a few things like the Max Threads, Max Stuck threads, Time until the probe waits, after reaching a certain number of stuck threads, for a restart and other stuff, it doesn't seem to be helping us. I would love to know how to achieve an efficient and fast discovery process.

 

Also, is there a way to fix these stuck threads so that the job continues without interruptions, other than logging on to the probe box and bouncing them ?

 

Please share your thoughts.

 

Thanks,

Praveen

Parents
  • Look for pattern/trend...is it alway stuck on a certain host? Is it a particular job? maybe it stuck on a command? root cause the problem instead of the changing system setting.

    Best approach is first see which job always stuck on then which host / command it is getting stuck on. Then figure why it is stuck?
  • Hi Chuong,

     

    Alright. So, I followed your suggestion to find the following:

    1: The Discovery job is Host Networking by SNMP

    2: There is a bunch of Switches on which these threads run and end up as In Progress(With error) message.

    3: Now I need to figure out if some command is causing this. Where do we find it ? I checked the communication log and I see no errors but it says, Incomplete communication log...  at the end of the file.

     

    I will have to start from scratch and watch the process with one particular IP address and see what exactly is happening with those switches.

     

    Thanks,

    PKS

Reply
  • Hi Chuong,

     

    Alright. So, I followed your suggestion to find the following:

    1: The Discovery job is Host Networking by SNMP

    2: There is a bunch of Switches on which these threads run and end up as In Progress(With error) message.

    3: Now I need to figure out if some command is causing this. Where do we find it ? I checked the communication log and I see no errors but it says, Incomplete communication log...  at the end of the file.

     

    I will have to start from scratch and watch the process with one particular IP address and see what exactly is happening with those switches.

     

    Thanks,

    PKS

Children