OO 10.60, 2 Centrals: DCCNM9932 and DCCNM9933
DB: SQL Server 2014 (8GB of RAM)
Customer getting the issue
OO java process (java.exe – Zulu Platform x64 Architecture) is consuming the memory a lot (10GB) once OO starts and leading many flows failed (currently status of flows was changed from “Pending Canceled” to “Canceled”)
- We did some changes on both centrals but no luck:
Increased the MaxPermSize from 256m to 1GB: wrapper.java.additional.8=-XX:MaxPermSize=1024m
Increased the numberofExecutionThreads from 200 to 800 and inBufferCapacity from 200 to 600
Just suggested to reduce the Initial Java Heap Size from 4GB to 1GB to extend the range of Java Heap size but I don’t know why our AMX engineer recommended to increase the Initial Java Heap Size to the same value with Max Java Heap Size 8GB? Does it make conflict?
- Commented the line #wrapper.java.additional.7=-XX: HeapDumpOnOutOfMemoryError for pausing the dump files generating temporarily.
- From the provided logs, we only can see the following errors:
INFO | jvm 1 | 2017/11/21 12:43:41 | java.lang.OutOfMemoryError: Java heap space
STATUS | wrapper | 2017/11/21 12:43:41 | The JVM has run out of memory. Restarting JVM.
INFO | jvm 2 | 2017/11/22 22:31:53 | SEVERE: The web application [/oo] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@2a783751]) and a value of type [org.springframework.security.core.context.SecurityContextImpl] (value [org.springframework.security.core.context.SecurityContextImpl@ffffffff: Null authentication]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory.
The following is the solution:
Performance tuning :
- in central / ras-wrapper.conf
java.initmemory - 2GB ( start with 1/4 from max)
cloudslang.worker.inBufferCapacity=600 ( it is ok)
cloudslang.worker.numberOfExecutionThreads=800 ( too much - down to 100 )
out.buffer.max.buffer.weight=7500 ( increase to 15000)
- in database.properties ( only for centrals)
db.pool.maxPoolSize=4000 ( too much ....recommend down to 100 or 200 )
2. Checked the DB settings are as indicated in the DB guide.
3. Checked before truncating or updating the OO_TRIGGERS table to see if you have any job in blocked state for more than 5 mins .
4. Finally to remediate the situation we truncate the OO_EXECUTION_QUEUES table