Performance Testing Primer
Why Performance Testing?
Because time is money!
Consider the case of an online retailer selling its products over the Internet. Potential customers must interact with the retailer’s Internet application to browse, search for, and buy products. Before customers will spend their money however, they will need to feel secure and find the products they want to buy. These customer needs represent the requirements of the application.
Apart from offering great products, a successful online application must:
- Be accessible (connectivity)
- Work correctly (functionally)
- Be convenient (UI and site design, usability)
- Respond quickly (performance)
From the retailer’s perspective, response times are critically important. The faster that customers can purchase products, the more customers that can be served, and the more revenue that the retailer can generate. When response times are slow, potential customers leave and spend their money elsewhere.
Application response times are heavily influenced by the load imposed on an application. This is where performance testing comes into play.
Performance testing encompasses a range of test types. At the most basic level, performance testing involves figuring out answers to questions related to application speed, efficiency, and stability. Performance testing however also encompasses resource consumption, capacity planning and hardware sizing (processors), memory, disk space, and network bandwidth.
Profiling is also considered to be an aspect of performance testing. Profiling is when performance-critical code (algorithms) is examined for speed optimization.
Load testing is another discipline that falls within the larger category of performance testing. Load testing focuses on the server side of application testing.
Today’s online business applications are becoming increasingly complex. They can be comprised of a variety of software and hardware components. The more complex an application is, the more difficult it becomes to gain a solid understanding of the application’s performance.
To avoid unnecessary future business disruption, businesses want to know prior to deployment how an application’s performance will be affected when the application is eventually scaled up to handle maximum load.
Professionals who ensure that an application meets business requirements are called performance engineers. Good performance engineers have extensive knowledge and experience. This paper is intended to help inexperienced performance engineers get started with performance testing, specifically load testing.
Your first step should be to make yourself familiar with the System Under Test (SUT) and identify the typical steps that are involved in using the application, for example logging in and logging out. Going forward, we will refer to these workflow steps as transactions.
Other typical transactions include:
- Adding items to a shopping cart
- Checking-out and purchasing items
If performance requirements are not currently available, you’ll need to define the requirements yourself by defining acceptable response times for specific actions that are executed within each transaction. For simple transactions such as the logging in and logging out, simply define an acceptable response time for login and an acceptable response time for logout.
How much time should you allow an application to perform a particular action? Acceptable response times vary. Use your intuition. Click through the application as if you were a real user and calculate acceptable response times based on your expectations.
Here are some sample response time ranges and corresponding general user perceptions of each response time range:
0 - 100 ms
100 - 300 ms
300 - 1000 ms
Machine is struggling
User undergoes mental context switch
User abandons task
This doesn’t mean that all response times must be below 100 ms to be considered acceptable. Users accept longer wait times when they know there is good reason for delay (for example, for transactions that require significant rendering or computing on the server-side).
These defined acceptable response times merely serve as a starting point and may be fine-tuned during testing using your performance-testing tool. Typically, performance-testing tools trigger notifications when acceptable response times are exceeded or when more complex success criteria do not meet with existing service level agreements (SLAs).
Load Testing Tools
The primary purpose of a load testing tool is to generate load against an SUT and help a performance engineer analyze how the server copes with the load. Due to the fact that an SUT may be built upon a variety of technologies (Internet/Web, middleware, DB, or proprietary technologies such as SAP, OracleForms, Citrix, and more), load testing tools must support a variety of technologies. The ability to test across a wide range of technologies is a significant differentiator among leading load-testing tools.
Load-testing tools allow for the modeling of real-world user interactions with the application into discreet transaction assets which are used to generate load on the SUT. These transaction assets are usually stored in the form of a test script, a visual format, or a combination thereof.
For each supported technology, a load testing tool either offers a manual scripting approach to modeling user interactions or it allows for the capturing of the actions of a real user who interacts with the application. This later approach is known as the recording approach. The recording approach is usually preferable due to its simplicity.
Web Load Testing
For Web technologies, high-end load testing tools usually allow the user to choose from two load-testing options:
- Protocol-based testing
- Browser-based testing
The difference between these two approaches lies in the way that they generate Web traffic. With the protocol-based approach, traffic is generated using a proprietary HTTP engine that simulates a real-world Web browser. With the browser-based approach, each virtual user uses an actual Web browser to generate load.
Sometimes it’s difficult to decide which approach to use. Here are some facts that should be considered when making your decision:
Pros and cons of the protocol-based approach:
+ Up to 5000 virtual users per machine
+ Detailed measuring and statistics
+ Rich tools support for customizations in Silk Performer
- Requires more scripting
- AJAX-heavy applications may be too complex for scripting
Pros and cons of the browser-driven approach:
+ Simple and easy to use
+ Supports asynchronous download/push and other proprietary communication techniques
- Resource hungry: only 30-50 virtual users per machine
- Rudimentary measuring and statistics
- Not as mature as the protocol-based testing approach
Getting Started – Single User Profiling
Looking in-depth at the transactions of a single user is a great way to identify performance issues without driving any load against the SUT. Performance issues often become apparent when analyzing the traffic of a single user. Such flaws can have a massive impact on server performance during actual load testing. If a single user stresses the server more than it should, under load the stress will be amplified and thereby bring the server down entirely or at a minimum dramatically increase response times.
Record the defined transactions one at a time and watch how data flows from the server to the client with each transaction (for example, login/logout, creating/deleting user stories, adding/deleting tasks to/from user stories, etc).
What to look at within each transaction:
- Response times
- Amount of data transferred (up and down)
- Download pattern of pages (sequentially or simultaneously)
- Which data are being transferred? Is it necessary that all the data be transferred?
For example, the performance and load-testing tool Silk Performer offers page statistics that include break-downs in a typical waterfall view:
Real users do not generate continuous server load. Typically, requests are grouped together to download the documents required for a specific Web page. The time that elapses between such request groups is called think time. From the server perspective, think time is the time between transactions in which the user does not request resources.
Having think times modeled in transactions is important for simulating real-user behavior, which is why load testing tools usually capture think times when they occur during recording.
Sometimes, real users do unexpected things like re-load pages that take too long to download, or they proceed to the next link/page before the current page has loaded completely.
Modeling transactions as realistically as possible is essential because such modeling sets the baseline for transaction response time thresholds and bandwidth considerations.
Content verifications are automatic checks made during simulation that test whether or not expected content is provided by the SUT. If unexpected content or missing content is received during testing, the corresponding verification check fails. In many applications it is not enough to simply evaluate transaction return codes, such as HTTP/1.1 200 OK. Often, error conditions are reported within a dedicated Web page that contains details of the error, while the response that carried the error details is passed as successful.
Content errors often occur only under load, which is why it is important that such errors be captured during testing along with additional information that may be useful to developers in reproducing the error conditions and fixing any underlying defects.
Typical content verification options for Web transactions include:
- HTML page-title verifications
- HTML content-digest verifications
- Custom content verifications that specify left and right boundaries
The concept of user types is designed to group users who share common attributes.
Typical attributes shared by user types are:
- Browsers (Internet Explorer, Chrome, FireFox, iPhone, etc.)
- Bandwidth limitations
- User behavior (for example, first time users vs. revisiting users)
In Silk Performer, user types are comprised of a test script (a set of transactions), a user group, and a user-settings profile.
Workload defines load characteristics. Workload is built around one of several workload models and is combined with one or more user types.
Typical workload models include:
- Increasing (ramp up phase, steady phase)
- Steady (same load level throughout entire test)
- Queuing (certain number of transactions within a certain time period)
- All-day (different load levels at different times)
- Manual (manual adjustment of load level)
Workload is used to test or prove one or more performance goals, such as:
- Transactions per second
- Hits per second
- Parallel sessions
Typical Load Scenarios
Following is a list of typical workload scenarios that can be implemented during load testing:
- System is exposed to increasing load until it breaks down (load testing)
- System is exposed to load from a small number of clients that interact with the SUT in rapid succession (stress testing)
- System is exposed to load from a large number of clients that interact with the SUT simultaneously (concurrency testing)
- System is exposed to load for a considerable period of time so that resource consumption over time can be monitored (endurance testing)
- System is exposed to spikes in load. For example, load levels spike from moderate to high and then quickly return to moderate levels. Testing evaluates how long it takes for the SUT to recover (spike testing)
Due to the nature of the Web, users can access Web applications from anywhere in the world. Depending on where an application is accessed from, user experience can vary based on network latency from the geographic region where the client is located.
The table below dramatically illustrates the significance of such regional network latency:
With its Cloudburst cloud-based testing offering, Silk Performer offers pre-configured on-demand load test agents in seven regions worldwide, thereby providing a cost-effective approach to adding globally-distributed test agents to your testing infrastructure.
End users commonly have varying bandwidth limitations that affect the application response times they experience. Good load testing tools can simulate such variations in end-user bandwidth limitations.
The SUT may have bandwidth limitations imposed by an Internet service provider or a cloud-based hosting service. Results from single-user tests can be used to estimate the maximum number of virtual users that available bandwidth can support. Bandwidth consumption per user is an important metric for proper sizing of the network infrastructure to meet performance requirements and SLAs.
Note: It’s vital that modeled transactions include think times so that the transactions simulate real users as accurately as possible.
Systems under test can vary dramatically. They may be simple Web servers or they may be multi-tiered systems that include application servers, SOA servers, DB servers, messaging servers, authentication servers, load balancers, proxies, accelerators, firewalls, and more. The more complex an SUT becomes, the more important it is that you pay attention to test coverage of all components in the system.
Each component that an application relies on to work is a potential bottleneck to system performance. Although it makes sense to thoroughly test individual components, end-to-end system testing is also valuable. Experience from customer engagements has shown that so-called last mile tests (end-to-end system testing of an actual production system including all network infrastructure components between end user and application) are essential for gaining insight into how geographic location, bandwidth constraints, and network configurations affect the way customers experience a website.
Virtual Test Infrastructure
Today’s state-of-the-art computer infrastructure is the virtual machine. In fact, many companies have IT policies that forgo physical machines entirely and only allow for the use of virtual infrastructure. There are good reasons for this, one being that hardware resources are shared and idle time is minimized. The problem with this approach is the lack of consistency. A system deployed on a virtual infrastructure may suffer performance issues due to resource consumption caused by other guest images running on the same hardware.
There are ways to minimize the effect of mutual interference between guest systems, but these approaches carry the cost of not utilizing hardware resources as efficiently as would be ideal.
For load testing, virtual infrastructure is both a blessing and a curse. Virtual load generators can be created and purged quickly and efficiently with almost no maintenance effort. The downside to this approach is the performance inconsistency of virtual images. If not configured properly, virtual load generators may cope inconsistently with the number of virtual users assigned by the controller. Inconsistent performance during a test inevitably produces inconsistent results, either for a particular load test or a series of related tests.
Proper configuration of virtual infrastructure for load testing guarantees continuous resource availability of CPU, memory and I/O bandwidth for each virtual load generator. Such ideal configuration is unfortunately not the default when the main requirement of a virtual infrastructure is resource-sharing optimization.
The chart below illustrates how proper configuration of virtual infrastructure (following build 5251) can make a difference in consistency and accuracy of test results:
Accuracy of measurements on virtual machines also deserves careful consideration. Measurements are typically captured using dedicated hardware timers and counters. Since access to specific hardware is virtualized by the VM host system, the accuracy of measurements is dependent upon the quality of the virtualization. Generally speaking, measurements captured on virtual machines are not as accurate as measurements captured on physical hardware. In particular, absolute values of single measurements are unreliable. Comparison of measurement averages captured on virtual machines is usually acceptable however.
Unwanted Load on Load Generators
The goal of load test tools is to report accurate results as much as possible. There are several factors that can skew consistency of time measurement and performance. Besides using virtual images as load generators, the configuration of load generators is also a factor. In particular, software such as virus scanners, sniffers, and other monitoring tools that hook deep into an operating system can have a significant impact on a system’s performance and behavior.
The following diagram illustrates the difference in CPU consumption of a load generator both with and without a virus scanner installed:
The increased CPU consumption results in a significantly lower Transactions Per Second (TPS) value. As soon as the CPU maxes out on the system with a virus scanner, the TPS reaches its peak level. On the same system without a virus scanner installed, the TPS value still correlates with the increasing number of virtual users.
Note: A load generator that reaches peak CPU capacity can no longer report accurate response times.
Client side monitoring is usually done automatically by load testing tools, either in real-time during the course of testing, or as stored response time data that is gathered during testing and made available for analysis and reporting following the conclusion of testing.
Server-side monitoring during load testing is equally important for two reasons: first, it enables you to see how the SUT copes with the load and, secondly, it enables you to correlate client and server-side data and thereby identify the root causes of performance problems.
For server-side monitoring, standard monitoring interfaces such as Perfmon on Windows, SNMP, or JMX, can be used. There are however alternative methods available for capturing server-side performance statistics:
- Parsing performance values from Web pages
- Querying statistics from DBS
- Running transactions on SAP
- Remote execution of a vmstat command using rexec or ssh
- Generic scripting
Note: The capturing of performance values puts additional load on servers and networks, especially when a high number of values are monitored simultaneously. Therefore monitoring can have a significant impact on server performance.
Borland Silk Tools - Best Practices
- Use Silk Central for load test management:
- Automate load tests
- Make test results persistent
- Test early and often to find performance issues as early as possible
- Visualize and analyze performance trends
- Access performance results, performance trends, and analysis reports
- Use Silk Performer’s baseline testing approach to set response time thresholds. Thresholds can be used to trigger errors and warnings when acceptable performance boundaries are exceeded during testing.
- Add content verifications to your test scripts. Content errors often occur only under load.
- Use TrueLog on Error screen captures and data logs to visually analyze the transaction histories that led up to errors uncovered during testing.
- Pay attention to load test agents’ health indicators (CPU, Memory, Responsiveness). Overloaded agent machines report inaccurate results