Performance testing is a critical component of web application development and maintenance, directly impacting user experience, customer satisfaction, and business revenue. For websites, even minor delays in load time can lead to significant losses in traffic and sales. Understanding how to design effective performance test scenarios is key to ensuring your system can handle real-world demands and deliver a seamless user experience.
Research shows the tangible impact of website performance on business outcomes: a 1-second delay in page load time reduces page views by 11% and customer satisfaction by 16%. Translating this to revenue, a website earning 100,000 RMB per day could lose approximately 250,000 RMB annually if it is just 1 second slower than competitors. Additionally, analysis of over 150 websites and 1.5 million page views found that increasing page response time from 2 seconds to 10 seconds leads to a 38% increase in page abandonment rate. These figures underscore why robust performance testing is non-negotiable for any web-based business.
While performance testing involves various scenarios—many of which are defined uniquely by individual enterprises—this guide focuses on four core, universally applicable scenarios to fully cover performance requirements: Baseline Scenario, Capacity Scenario, Stability Scenario, and Exception Scenario. Below is a detailed breakdown of how to implement these scenarios effectively in real-world projects.
Before designing specific scenarios, it’s essential to follow a structured performance testing process. As a classic proverb states: “Without rules and compasses, nothing can be shaped into a square or a circle.” A clear process ensures consistency, reduces errors, and provides a roadmap for executing tests efficiently.
After completing all prerequisites (such as environment setup, data preparation, and script development), you can proceed to scenario design and execution. Note that performance analysis is beyond the scope of this guide. For the purposes of this article, we use JMeter as the load testing tool, where TPS (Throughput) refers to the Throughput value in JMeter’s Summary Report. We focus solely on backend APIs, excluding frontend load times for JS, CSS, and other page elements. Other load testing tools can also be used—our focus is on scenario design methodology, not tool selection.
A baseline scenario involves testing a single interface in isolation. Its primary goal is to establish a performance baseline for individual components, providing a reference point for evaluating subsequent tests (such as capacity or stability scenarios). This step ensures you understand the performance of each interface before combining them into more complex scenarios.
The choice of testing environment varies by industry and company size:
Internet enterprises: Typically perform performance testing in the production environment, as replicating a full-scale production environment involves high hardware, software, labor, and maintenance costs.
Traditional/financial industries (banks, insurance): Often use dedicated, independent performance testing environments that support convenient testing, tuning, and isolation from production systems.
For testing environments, server and application container configurations should match production as closely as possible. If full consistency is impossible, scaled-down environments must follow a fixed ratio to ensure credible, actionable results.
High-quality test data is critical for accurate performance testing:
Test data should come from desensitized real production data to mirror real-world usage patterns.
Back up the database before testing to facilitate troubleshooting, root cause analysis, and test repeatability.
Data volume must be comparable to production. A database with hundreds of records performs vastly differently from one with millions or tens of millions of records—insufficient data volume leads to untrustworthy test results.
Many engineers make the mistake of using only a small set of test data, which creates unrealistic pressure and invalid results. The number of parameterized data entries must align with real business traffic to ensure tests reflect actual user behavior.
The required parameter data volume depends on the specific business scenario. Below are common examples to guide your parameterization strategy:
Assign one user account per thread, with the same user performing repeated operations. This is suitable for scenarios where users log in once and stay online for an extended period (e.g., all-day internal system usage). In this case, the number of user accounts should equal the number of threads.
Repeatedly purchasing with one account is unrealistic for e-commerce platforms. User accounts must be planned based on TPS and test duration to simulate real user behavior, ensuring the test reflects how customers interact with the system in production.
A thread can reuse a fixed set of parameters cyclically. For example, 100 threads and 1,000 data entries allow 10 different data entries per thread. There is no fixed rule for the number of entries—decisions must be based on real business requirements and user behavior patterns.
Fully reused parameters (only 1–2 records) rarely reflect real production scenarios and should be avoided.
Valid parameterized data can come from two primary sources:
Existing records in the database (desensitized to comply with data privacy regulations).
Data generated by the load testing tool (ensuring it matches production data distribution).
All data must satisfy two key conditions: it must match the data distribution in production, and it must meet the data volume requirements of the performance scenario.
Improper parameterization directly undermines test validity: insufficient data leads to low cache usage on application, cache, and database servers, failing to simulate real pressure; it also reduces storage I/O activity. Excessive data, on the other hand, may create unnecessary pressure that does not reflect real-world conditions.
When using database-sourced data, verify the data histogram to ensure distribution matches production. For data generated in the testing environment, always validate its distribution against production data to maintain accuracy.
Stop the baseline scenario when either system resource utilization reaches approximately 90% or resources are fully utilized. If performance bottlenecks appear during testing, conduct tuning to ensure reasonable TPS and response time before proceeding to other scenarios.
The capacity scenario combines all interfaces from the baseline test in realistic business proportions and runs them together. Its core purpose is to answer a critical question: What is the maximum online capacity the system can support without compromising performance?
To design an effective capacity scenario, you need to address three key questions: How to determine the ratio between interfaces? How to configure these ratios in the load testing tool? When to stop the capacity test?
The first step is to extract real traffic data from production logs to identify the actual ratio of interface requests. This ensures your capacity scenario mirrors real-world usage. You can extract logs using:
Log platforms (e.g., Lambda platform) to retrieve interface data for a specific time period.
Nginx logs, parsed using Python scripts or Shell commands, to get the number of requests per interface per time period.
ELK/EFK stacks, configured to extract and analyze traffic data in the required format.
If high concurrency for one interface occurs at a different time from others, design multiple capacity scenarios to reflect these different business models. This ensures each scenario accurately represents a specific traffic pattern.
Calculate the proportion of each interface using the following formula: Interface Ratio = (Request Count of Single Interface) / (Total Request Count). This gives you the percentage of traffic each interface receives in production.
To implement the extracted traffic ratios in JMeter, use two key components:
Throughput Timer: Controls the overall TPS of the scenario, ensuring it aligns with production traffic levels.
Throughput Controller: Allocates traffic ratios among interfaces, matching the proportions extracted from production logs.
Detailed configuration instructions for these components are widely available online, making it easy to implement the required ratios.
A clear end condition is essential to avoid endless testing. The capacity scenario ends when the system reaches one of the following:
Predefined target TPS (based on business requirements).
Saturation of critical resources (CPU, memory, I/O, etc.).
Performance degradation beyond acceptable thresholds (e.g., response time exceeding user expectations).
This ensures the results of the capacity test align with real production capacity limits and provide actionable insights.
While the capacity scenario verifies the maximum load a system can handle, the stability scenario focuses on long-term service performance. It detects cumulative issues such as memory leaks, connection leaks, or resource exhaustion that may not appear during short-term tests. Execution duration is the most critical metric for stability testing.
A practical formula to determine the required stability test duration is:
Stability Duration (hours) = Total Business Volume / (Peak TPS × 3600)
Example: If the total business volume in one operational cycle is 60,000,000 transactions and the peak TPS from the capacity test is 1000, the stability duration would be 60,000,000 / (1000 × 3600) ≈ 16.67 hours.
The system must run stably at peak TPS for the calculated duration to verify its reliability over extended periods. This ensures the system can handle sustained traffic without performance degradation.
Exception scenario design starts with system architecture analysis to identify potential risk points. After functional testing, most performance exceptions fall into two categories: architecture-level exceptions and performance exceptions caused by capacity limits. To design comprehensive exception scenarios, you must first understand the system’s architecture and potential failure points.
You can simulate exceptions using manual operations or chaos engineering tools. Common methods include:
Host: Power off, reboot, shutdown, or simulate hardware failures.
Network: Disable the network interface (using the ifdown command), simulate packet loss, latency, jitter, or retransmission issues.
Application: Kill the application process, stop the service, or simulate application crashes.
To ensure comprehensive coverage of exception scenarios, follow this structured approach:
List all components in the technical architecture (e.g., servers, databases, network devices, applications).
Analyze potential single points of failure and abnormal behavior for each component.
Design test cases to simulate each identified exception and evaluate the system’s response.
For a more systematic approach, reference the FMEA (Failure Mode and Effects Analysis) model, which helps identify and prioritize potential failures. However, avoid over-engineering—adapt methodologies to your company’s actual environment and develop a custom exception scenario model that fits your system’s unique requirements.
This four-scenario framework—Baseline, Capacity, Stability, and Exception—provides a complete, repeatable, and production-aligned approach to performance testing. By following this methodology, you can ensure your system meets performance requirements, delivers a seamless user experience, and supports business growth. Remember to adjust the framework based on your business characteristics, team tools, and architecture to build a reliable performance evaluation system.