(OO) Support Tip: Java Heap size is getting Full

Environment:

OO 10.60, 2 Centrals: DCCNM9932 and DCCNM9933
DB: SQL Server 2014 (8GB of RAM)

Current Issue:

Customer getting the issue
OO java process (java.exe – Zulu Platform x64 Architecture) is consuming the memory a lot (10GB) once OO starts and leading many flows failed (currently status of flows was changed from “Pending Canceled” to “Canceled”)

Troubleshooting done
- We did some changes on both centrals but no luck:
Increased the MaxPermSize from 256m to 1GB: wrapper.java.additional.8=-XX:MaxPermSize=1024m

Increased the numberofExecutionThreads from 200 to 800 and inBufferCapacity from 200 to 600
wrapper.java.additional.25=-Dcloudslang.worker.numberOfExecutionThreads=800
wrapper.java.additional.26=-Dcloudslang.worker.inBufferCapacity=600

Just suggested to reduce the Initial Java Heap Size from 4GB to 1GB to extend the range of Java Heap size but I don’t know why our AMX engineer recommended to increase the Initial Java Heap Size to the same value with Max Java Heap Size 8GB? Does it make conflict?
- Commented the line #wrapper.java.additional.7=-XX: HeapDumpOnOutOfMemoryError  for pausing the dump files generating temporarily.

- From the provided logs, we only can see the following errors:
INFO   | jvm 1    | 2017/11/21 12:43:41 | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper  | 2017/11/21 12:43:41 | The JVM has run out of memory.  Restarting JVM.
INFO   | jvm 2    | 2017/11/22 22:31:53 | SEVERE: The web application [/oo] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@2a783751]) and a value of type [org.springframework.security.core.context.SecurityContextImpl] (value [org.springframework.security.core.context.SecurityContextImpl@ffffffff: Null authentication]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory.

The following is the solution:

Performance tuning :

- in central / ras-wrapper.conf
java.initmemory - 2GB ( start with 1/4 from max)
maxmemory=8G (OK)
cloudslang.worker.inBufferCapacity=600 ( it is ok)
cloudslang.worker.numberOfExecutionThreads=800 ( too much - down to 100 )
out.buffer.max.buffer.weight=7500 ( increase to 15000)

- in database.properties ( only for centrals)
db.pool.maxPoolSize=4000 ( too much ....recommend down to 100 or 200 )

2. Checked the DB settings are as indicated in the DB guide.

3. Checked before truncating or updating the OO_TRIGGERS table to see if you have any job in blocked state for more than 5 mins .
4. Finally to remediate the situation we truncate the OO_EXECUTION_QUEUES table