In the Thread Pool Dispatcher threadMax article, you have seen that the total time taken for all the threads to complete the invocations is significantly shorter if more worker threads are available to do the work. But this is not always the case.
In this article, you will see that, in some cases, having more worker threads doing the work may not always lead to a significantly better performance.
How much performance improvement do you expect to see when "threadMax" is increased from 10 to 100?
Configure the Client by modifying c.sh:
Configure the Server by modifying s.sh:
Make sure you have set up the necessary VisiBroker environment and build the example before running this demonstration.
MEMORY(KB) THREADS SOCKETS
12352 14 1
Client Thread Id:P24282_T99_S1 work's takes 1.22422 seconds to complete.
Client Thread Id:P24282_T27_S1 work's takes 1.69546 seconds to complete.
Client Thread Id:P24282_T60_S1 work's takes 1.25104 seconds to complete.
Client Thread Id:P24282_T95_S1 work's takes 1.40158 seconds to complete.
Client Thread Id:P24282_T16_S1 work's takes 1.36785 seconds to complete.
. . . . . .
Total Time taken for 100 threads in PID 24282 to complete all invocations is 15.9356 seconds
MEMORY(KB) THREADS SOCKETS
14704 104 1
Client Thread Id:P24838_T46_S1 work's takes 14.0853 seconds to complete.
Client Thread Id:P24838_T51_S1 work's takes 14.0938 seconds to complete.
Client Thread Id:P24838_T36_S1 work's takes 14.0769 seconds to complete.
Client Thread Id:P24838_T21_S1 work's takes 14.0713 seconds to complete.
Client Thread Id:P24838_T41_S1 work's takes 14.0797 seconds to complete.
. . . . . .
Total Time taken for 100 threads in PID 24838 to complete all invocations is 15.7661 seconds
Compare the resource consumption and invocation performance measurement before and after tuning “vbroker.se.iiop_tp.scm.iiop_tp.dispatcher.threadMax” property from 10 to 100 at the Server side.
|Server Memory (KB)||Server Threads||Avg Time Per Invocation (sec)||Total Time Taken (sec)|
Key observations after tuning:
Some of you may be surprised by the insignificant performance improvement (or even slight degradation in some environment) when 10 times more worker threads are created to service the invocations. Each invocation also take significantly more time to complete. This seems like a case of bad ROI, but there is an explanation for this phenomenon.
When 100 worker threads are servicing the CPU intensive requests concurrently, a lot of context switches are performed to ensure that each of the 100 threads have a slice of the CPU time. As a result, it takes longer for each thread to complete the operation. When only 10 worker threads are servicing the requests concurrently, less context switches are occurring, and so each thread has more CPU time. This results in a shorter time for each thread to complete the operation.
Does this contradict the observation in Thread Pool Dispatcher threadMax article (i.e. higher "threadMax" gives better performance)? Not really. The key difference here is the type of processing performed by the invocations (i.e. blocking vs CPU intensive operations). In most realistic systems, there is always a mixture of simple/fast, blocking and CPU intensive invocations. If you set the "threadMax" too low, blocking invocations can easily increase the latency of simple/fast invocations.
The Operating System and hardware configuration can also influence the performance of CPU intensive operations. For example, a system with higher CPU/core count may be able to handle concurrent CPU intensive operations more efficiently. In this case, a higher "threadMax" may improve the performance.
Note that VBJ also exhibits similar behavior.
In this article, you have learnt that it is important to understand your application and system during performance tuning. You should also perform benchmark and stability tests under peak load conditions to find the optimal configuration that can give you a stable application with an acceptable level of performance.