Skip to content

Instantly share code, notes, and snippets.

@akshar-raaj
Last active October 23, 2024 05:44
Show Gist options
  • Save akshar-raaj/4b2c28788c16540c8ecc4990af7f7485 to your computer and use it in GitHub Desktop.
Save akshar-raaj/4b2c28788c16540c8ecc4990af7f7485 to your computer and use it in GitHub Desktop.
Optimizing Nginx and Python Configuration with Load Testing: A Case Study
Objective
The goal of this experiment was to determine how many concurrent users and requests per second a modest server configuration could handle with a basic Nginx and Python web application setup. The server in question has 1 GB RAM and a 2000 MHz single-core CPU, hosted on Linode.
Initial Setup
We started with a high-performance Nginx server capable of handling 10,000 requests per second for static content. For dynamic content, we configured Nginx to route requests to a Python web application. The web application simulates both CPU-bound tasks and a database operation with a total processing time of approximately 20 milliseconds per request, giving the backend the ability to serve at most 50 requests per second.
The backend Python application was served by a single instance initially, and Nginx was configured to proxy requests to this backend:
nginx
Copy code
upstream backend {
server 127.0.0.1:8000;
}
location /py {
proxy_pass http://backend;
}
Experiment 1: 1 Concurrent User
We used Locust as our load testing tool to simulate concurrent users making requests. With 1 concurrent user, we observed:
Requests per second: ~20
Total requests: ~1200 in 1 minute
p50 response time: 50 ms
p75 response time: 60 ms
While the Python web application processes each request in around 20 ms, the average observed latency (~50 ms) was likely due to network round-trip time and other overheads.
Experiment 2: 5 Concurrent Users
Next, we increased the number of concurrent users to 5. Results:
Requests per second: ~45
Total requests: ~2700
p50 response time: 100 ms
p95 response time: 110 ms
The increased latency is expected due to queuing, as multiple requests must wait for the single-threaded Python web application to process them. However, we observe that the request throughput is not increasing significantly despite the higher concurrency.
Experiment 3: 10 Concurrent Users
Increasing the number of concurrent users to 10 yielded:
Requests per second: ~47
Total requests: ~2900
p50 response time: 170 ms
p95 response time: 180 ms
At this point, the latency starts to increase more significantly due to the queue buildup, while the request throughput remains roughly the same (~45-50 requests per second). This highlights an important limitation: No matter how much we increase concurrency, the system cannot move past 50 requests per second, as it hits the backend’s capacity.
Key Observation: Throughput Bottleneck
Despite increasing the number of concurrent users, the requests per second plateaued around 45-50. As the concurrency rises, the latency increases while the throughput remains unchanged. This is a clear sign of a bottleneck in the backend Python application.
The single-threaded, blocking nature of the application is preventing it from processing more than 50 requests per second. Without further optimization, any attempt to increase concurrency beyond this threshold results in increased queuing time and degraded response times.
Optimization: Running 10 Python Backend Instances
We observed that CPU utilization for the backend Python application was less than 5%. Since the web application is blocking, we decided to scale horizontally by running 10 instances of the Python application on different ports using the following shell script:
bash
Copy code
#!/bin/bash
# Start 10 instances of web_application.py on different ports
for i in {1..10}; do
port=$((8000 + i - 1))
python3 web_application.py $port >> "server_$port.log" 2>&1 &
echo $! >> pids.txt # Save the PID of each process to pids.txt
done
echo "Started 10 Python web applications."
We updated the Nginx configuration to distribute requests among the 10 backend instances using round-robin load balancing:
nginx
Copy code
upstream backend {
server 127.0.0.1:8000;
server 127.0.0.1:8001;
server 127.0.0.1:8002;
server 127.0.0.1:8003;
server 127.0.0.1:8004;
server 127.0.0.1:8005;
server 127.0.0.1:8006;
server 127.0.0.1:8007;
server 127.0.0.1:8008;
server 127.0.0.1:8009;
}
This allowed us to handle 10 concurrent requests simultaneously across the Python instances, significantly improving performance.
Experiment 4: 10 Concurrent Users with 10 Backend Instances
With the new setup, we repeated the load test with 10 concurrent users:
Requests per second: ~200 (or 12,000 requests per minute)
Total requests: 12,000
p50 response time: 50 ms
p60 response time: 60 ms
Experiment 5: 50 Concurrent Users
Next, we tested the system with 50 concurrent users, and the results were as follows:
Requests per second: ~470 (or 28,000 requests per minute)
Total requests: 28,000
p50 response time: 100 ms
p95 response time: 110 ms
Experiment 6: 100 Concurrent Users
Finally, we tested the system with 100 concurrent users. The results:
Requests per second: ~480 (or 29,000 requests per minute)
Total requests: ~29,000
p50 response time: Varied from 50 ms to 150 ms throughout the test
p95 response time: Ranged from 175 ms to 350 ms
Key Observations:
Throughput Limit with 1 Backend Instance: In the initial setup, with only one Python backend application, the system was able to handle a maximum throughput of 50 requests per second. This was the processing limit of a single instance.
10x Throughput Increase: After scaling horizontally by running 10 backend instances, we observed a 10x increase in request throughput, reaching up to 500 requests per second. This demonstrates the clear benefit of distributing the load across multiple Python instances.
Latency Improvements: Initially, with 10 concurrent users, the response time ranged from 50 ms to 150 ms. After scaling the system to 100 concurrent users with 10 backend instances, the response time remained within the same range—50 ms to 150 ms. This illustrates the system’s capacity to handle 10x more users without any significant impact on latency or response time.
Further Scaling and CPU Usage: Interestingly, even with 10 Python backend applications, the CPU was not fully saturated. As a result, we increased the number of backend instances to 30 Python web applications and modified Nginx's load-balancing configuration accordingly. This allowed us to further scale the system’s throughput.
Throughput Surpassed 1000 Requests Per Second: With 30 backend instances, the system was able to handle over 1000 requests per second, demonstrating a 20x throughput increase from the initial 50 requests per second limit. This shows the server’s capacity to scale horizontally while maintaining acceptable performance levels.
Conclusions
Concurrency Scaling: The system successfully scaled 10x in terms of handling concurrent users, from handling 1 user initially to 100 users, while keeping response times consistent and within acceptable ranges.
Throughput Scaling: The system’s throughput scaled 10x, from 50 requests per second with 1 backend instance to 500 requests per second with 10 backend instances. After further scaling to 30 instances, the throughput exceeded 1000 requests per second, a 20x increase from the initial setup.
Latency Maintenance: Even with a 10x increase in concurrent users, the system was able to maintain similar latency (p50, p95) ranges, indicating excellent horizontal scaling without compromising response time.
Modest Server Capacity: This experiment highlights that a modest server configuration with 1 GB RAM and a single-core CPU can be optimized to handle significant traffic loads by leveraging proper horizontal scaling.
Horizontal Scaling Efficiency: The experiment confirms that horizontal scaling (running multiple backend instances) can significantly increase request processing capacity without impacting response times, as long as resource usage like CPU remains within limits.
Next Steps
Further Scaling: Explore testing with more concurrent users and backend instances to identify the next bottleneck (such as network, disk I/O, or CPU).
Backend Optimization: Consider migrating to an asynchronous framework (e.g., FastAPI) or using asynchronous workers (like gevent in Flask) to further improve throughput and resource efficiency.
@akshar-raaj
Copy link
Author

one_backend_server
ten_backend_server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment