Thread Pool Performance

Thread Pool Performance

[[Thread Pool Dispatcher threadMaxIdle|Back]]

In the [[Thread Pool Dispatcher threadMax|Thread Pool Dispatcher threadMax]] article, you have seen that the total time taken for all the threads to complete the invocations is significantly shorter if more worker threads are available to do the work. But this is not always the case.

In this article, you will see that, in some cases, having more worker threads doing the work may not always lead to a significantly better performance.

The "echo_service_cpp" [[Explanation of Example|example]] is used in this demonstration.

Scenario

  • The Client creates 100 threads to call 1 Server concurrently.
  • Each client thread calls the Server once.
  • Each invocation performs CPU intensive processing at the Server side.
  • Monitor the Server's resource consumption and invocation performance with “threadMax=10” set.
  • Observe the effect of setting "threadMax=100".

How much performance improvement do you expect to see when "threadMax" is increased from 10 to 100?

Preparation

Configure the Client by modifying c.sh:

  • Set the following properties:
    • server_sleep_time 0 (0 implies Server perform CPU intensive processing)
    • vbroker.agent.enableLocator=false
  • Disable all the other properties.

Configure the Server by modifying s.sh:

  • Set the following properties:
    • vbroker.se.iiop_tp.scm.iiop_tp.dispatcher.threadMax=10
    • vbroker.agent.enableLocator=false
    • vbroker.se.default.local.manager.enabled=false
  • Disable all the other properties.

Execution

Make sure you have set up the necessary VisiBroker environment and build the example before running this demonstration.

  • Start the Server Monitor script:
    • mon_s.sh
  • Start the Server:
    • s.sh 1
  • Start the Client:
    • c.sh 1 1 100
  • Monitor the Server's resource consumption (printed by Server Monitor). E.g.:

MEMORY(KB) THREADS SOCKETS
12352      14     1

  • Note the time for each operation to complete (printed by Server). E.g.:

Client Thread Id:P24282_T99_S1 work's takes 1.22422 seconds to complete.
Client Thread Id:P24282_T27_S1 work's takes 1.69546 seconds to complete.
Client Thread Id:P24282_T60_S1 work's takes 1.25104 seconds to complete.
Client Thread Id:P24282_T95_S1 work's takes 1.40158 seconds to complete.
Client Thread Id:P24282_T16_S1 work's takes 1.36785 seconds to complete.
. . . . . .

  • Note the total time to complete all invocations (printed by Client). E.g.:

Total Time taken for 100 threads in PID 24282 to complete all invocations is 15.9356 seconds

  • Stop the Client and Server.
  • Set the following property at the Server side by modifying s.sh:

vbroker.se.iiop_tp.scm.iiop_tp.dispatcher.threadMax=100

  • Re-start the Server:
    • s.sh 1
  • Re-start the Client:
    • c.sh 1 1 100
  • Monitor the Server's resource consumption (printed by Server Monitor). E.g.:

MEMORY(KB) THREADS SOCKETS
14704      104      1

  • Note the time for each operation to complete (printed by Server). E.g.:

Client Thread Id:P24838_T46_S1 work's takes 14.0853 seconds to complete.
Client Thread Id:P24838_T51_S1 work's takes 14.0938 seconds to complete.
Client Thread Id:P24838_T36_S1 work's takes 14.0769 seconds to complete.
Client Thread Id:P24838_T21_S1 work's takes 14.0713 seconds to complete.
Client Thread Id:P24838_T41_S1 work's takes 14.0797 seconds to complete.
. . . . . .

  • Note the total time to complete all invocations (printed by Client). E.g.:

Total Time taken for 100 threads in PID 24838 to complete all invocations is 15.7661 seconds

Observations

Compare the resource consumption and invocation performance measurement before and after  tuning “vbroker.se.iiop_tp.scm.iiop_tp.dispatcher.threadMax” property from 10 to 100 at the Server side.

Server Memory (KB)  Server Threads Avg Time Per Invocation (sec) Total Time Taken (sec)
Before Tuning 12352 14 1.39 15.9356
After Tuning 14704 104 14.08 15.7661

Key observations after tuning:

  • The number of threads created by the Server is significantly higher.
  • The memory usage by the Server is significantly higher.
  • Total time taken for all the threads to complete the invocations is only slightly shorter.
  • The time taken for each operation to complete at the server side is longer.

Explanation

Some of you may be surprised by the insignificant performance improvement (or even slight degradation in some environment) when 10 times more worker threads are created to service the invocations. Each invocation also take significantly more time to complete. This seems like a case of bad ROI, but there is an explanation for this phenomenon.

When 100 worker threads are servicing the CPU intensive requests concurrently, a lot of context switches are performed to ensure that each of the 100 threads have a slice of the CPU time. As a result, it takes longer for each thread to complete the operation. When only 10 worker threads are servicing the requests concurrently, less context switches are occurring, and so each thread has more CPU time. This results in a shorter time for each thread to complete the operation.

Does this contradict the observation in [[Thread Pool Dispatcher threadMax|Thread Pool Dispatcher threadMax]] article (i.e. higher "threadMax" gives better performance)? Not really. The key difference here is the type of processing performed by the invocations (i.e. blocking vs CPU intensive operations). In most realistic systems, there is always a mixture of simple/fast, blocking and CPU intensive invocations. If you set the "threadMax" too low, blocking invocations can easily increase the latency of simple/fast invocations.

The Operating System and hardware configuration can also influence the performance of CPU intensive operations. For example, a system with higher CPU/core count may be able to handle concurrent CPU intensive operations more efficiently. In this case, a higher "threadMax" may improve the performance.

Note that VBJ also exhibits similar behavior.

In this article, you have learnt that it is important to understand your application and system during performance tuning. You should also perform benchmark and stability tests under peak load conditions to find the optimal configuration that can give you a stable application with an acceptable level of performance.

[[Thread Pool Dispatcher threadMaxIdle|Back]]  |  [[Other Thread Pool Dispatcher Tuning|Next]]

Labels (1)

DISCLAIMER:

Some content on Community Tips & Information pages is not officially supported by Micro Focus. Please refer to our Terms of Use for more detail.
Top Contributors
Version history
Revision #:
2 of 2
Last update:
‎2020-03-13 21:06
Updated by:
 
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.