Worker issues

Hi,  am seeing below error messages in execution logs frequently.  Please help in analyzing and remediating.

Error 1

[WorkerFillBufferThread] (InBuffer.java:148) ERROR - Fill InBuffer thread was interrupted... java.lang.InterruptedException: sleep interrupted

Error 2

[scoreWorkerScheduler-1] (OutboundBufferImpl.java:172) ERROR - Failed to drain buffer, invoking worker internal recovery... java.lang.RuntimeException: runningExecutionPlan is null

2nd error continued with long stack of info.  I have posted couple of times an hour before but those 2 posts were removed.

Please suggest the corrective actions we can initiate.

 

Satya

  • which version you are running on? i remember that the second error appeared in one of the older versions. ( i think something between 10.80 and 2018.05)

    as far as i remember in those versions the Running execution plans are shared between flowinstances of same flow type. 

    what happens they count the references in DB ... when ever a flow instance ends it deletes the runningexecutionplan or reduces the reference count.

    under certain load conditions it happened that the runningexecution plan was deleted although there is still an active instance..... 

    I think at least in 2019.11 this should be gone.

    To clean that up you will need a specific DB Cleanupscript as far as  i remember (depending on OO Version) suggest to contact Support

  • Issue resolved by removing stale entries.

  • Hi There, 

    Could you elaborate on removing stale records? Are these in the DB? How did you find the stale records? 

     

    Looks like our system is having similar issues and MF support has provided scripts to clean up the database but that still hasn't worked. 

     

     

  • Hi

    I saw the same issue on my installation, MF support sends db cleanup scripts but in the first attempt they did NOT  work either ! After sending / receiving a few mails, they sent some sql scrits that worked and I got rid of these stale records. So, maybe you should go back to support ... .

  • Thanks hbit. 

    Support have given me a number of truncate scripts that will hopefully fix the issue. Shall let you know how that goes. 

  • The truncate scripts that support provided have worked and fixed the issue. 

     

    However, does anyone know how the OO_EXECUTION_QUEUES table is populated? When the issue was occuring that table contained 7500 records. After running the scripts today the table has stayed empty. 

     

    Would be good to narrow down what the cause of the stale records was, and if this table being populated is a sign that the same issue would occur. This might be another support question.