Tuning garbage collection and IDM JRE

This originally was part of an answer to a forum query with a similar title. And since things move really fast in the IDM world, especially in the forums, it becomes difficult to keep track of the post and refer to it when needed. Therefore why not a cool solution?

Here it goes:

As far as IDM is concerned, the default JVM options do serve well. Nevertheless here are a few pointers:

  •  The aim of GC tuning is to avoid having longer running times for the GC collector. So any optimization that we aim to bring should be based on this factor. If it is running well then no change is needed. Do not fix it if it ain't broken.

    In order to know the GC running times following helps:

    •  Generating a GC log file :Add these options to JVM

-XX: PrintGCDetails -XX: PrintGCDateStamps -Xloggc:<file-path>

You should GC details like these below:

43864.961: [GC [PSYoungGen: 41920K->384K(42432K)] 128084K->87204K(129856K), 0.0129540 secs] [Times: user=0.04 sys=0.00, real=0.02 secs]
43864.974: [Full GC [PSYoungGen: 384K->0K(42432K)] [ParOldGen: 86820K->61611K(87424K)] 87204K->61611K(129856K) [PSPermGen: 43136K->43136K(43776K)], 0.1013350 secs] [Times: user=0.45 sys=0.00, real=0.10 secs]

The first is the partial GC run that collects new (or Young) Gen space while the second is a Full GC run that collects old Gen space as well.

Typically these are in millis of second. And when they are not that is when optimization may help. Since the partial GC runs more frequently than the Full GC any delay here affects application performance more than the delay caused by Full GC. The application gets affected because, depending on the algorithm selected for GC, JVM stops running of every other thread other than GC threads.

  • Adding JMX parameters and connecting via JConsole.

Dcom.sun.management.jmxremote.port=xxx -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

As pointed out in the forum by Aaron, be sure to use authentication if you need the jmx ports to be secure.

JConsole will show the average running time for past few collections probably the time from when it was connected.

GC time:*
*0.760 seconds on PS MarkSweep (3 collections)
*0.664 seconds on PS Scavenge (21 collections)

It will also show heap distribution between New gen ("Eden" , "Survivor") and Old Gen space under "Memory" tab.

  • Heap distribution can also be dumped using

jstat –gc <vmid> 1000 .

Usually vmid is process id. However value returned jps would be more accurate.

  •  Now coming to the problems one may encounter

    • Partial/Minor GC runs too often say every second.

      In this case not enough memory has been allocated under "Eden" space when configuring the heap sizes. Which means GC has to run often to free up space. So a parameter of -XX:SurvivorRatio=10 would help. This is would increase memory given "Eden" space. Parameters -Xmx and -Xms can be at 500m (or more).

  • Partial/Minor GC runs for an extended interval of time say 3-4 seconds.

    In this case a large space has been set to Young Gen. So GC has to run for a longer time in order to move through all areas of memory. This is where the IDM advice "increase the heap in small increments" comes to picture. If too large a heap is allocated, considerable time is spent in going through it to identity spaces to be freed.

    Setting Parameter -XX:NewRatio=3 helps to reduce space given to Young Gen by increasing the ratio of memory given to Old Gen. By default this parater is 2 which means that "Old Gen" is twice of Young Gen.

  • Full GC runs too often, more than 2-3 times per 10 minutes.

    In this case there is not enough space set for old gen. Consider increasing the -XX:NewRatio=3 or may be -XX:NewRatio=4

  • Full GC runs for extended periods of time say 5 seconds.

    Visit memory allocation settings (-Xmx, -Xms) to check if they are not too large. However if this memory requirement is mandatory consider changing GC algorithm to -XX: UseConcMarkSweepGC -XX: UseParNewGC. Concurrent Mark & Sweep reduces the time interval for which the entire application is stopped.


How To-Best Practice
Comment List