akshar-raaj · October 23, 2024 05:44 · akshar-raaj · Oct 23, 2024
diff --git a/nginx-python-optimum-capacity.txt b/nginx-python-optimum-capacity.txt
 Optimizing Nginx and Python Configuration with Load Testing: A Case Study
 Objective
 The goal of this experiment was to determine how many concurrent users and requests per second a modest server configuration could handle with a basic Nginx and Python web application setup. The server in question has 1 GB RAM and a 2000 MHz single-core CPU, hosted on Linode.

 Initial Setup
 We started with a high-performance Nginx server capable of handling 10,000 requests per second for static content. For dynamic content, we configured Nginx to route requests to a Python web application. The web application simulates both CPU-bound tasks and a database operation with a total processing time of approximately 20 milliseconds per request, giving the backend the ability to serve at most 50 requests per second.

 The backend Python application was served by a single instance initially, and Nginx was configured to proxy requests to this backend:

 nginx
 Copy code
 upstream backend {
  server 127.0.0.1:8000;
 }

 location /py {
  proxy_pass http://backend;
 }
 Experiment 1: 1 Concurrent User
 We used Locust as our load testing tool to simulate concurrent users making requests. With 1 concurrent user, we observed:

 Requests per second: ~20
 Total requests: ~1200 in 1 minute
 p50 response time: 50 ms
 p75 response time: 60 ms
 While the Python web application processes each request in around 20 ms, the average observed latency (~50 ms) was likely due to network round-trip time and other overheads.

 Experiment 2: 5 Concurrent Users
 Next, we increased the number of concurrent users to 5. Results:

 Requests per second: ~45
 Total requests: ~2700
 p50 response time: 100 ms
 p95 response time: 110 ms
 The increased latency is expected due to queuing, as multiple requests must wait for the single-threaded Python web application to process them. However, we observe that the request throughput is not increasing significantly despite the higher concurrency.

 Experiment 3: 10 Concurrent Users
 Increasing the number of concurrent users to 10 yielded:

 Requests per second: ~47
 Total requests: ~2900
 p50 response time: 170 ms
 p95 response time: 180 ms
 At this point, the latency starts to increase more significantly due to the queue buildup, while the request throughput remains roughly the same (~45-50 requests per second). This highlights an important limitation: No matter how much we increase concurrency, the system cannot move past 50 requests per second, as it hits the backend’s capacity.

 Key Observation: Throughput Bottleneck
 Despite increasing the number of concurrent users, the requests per second plateaued around 45-50. As the concurrency rises, the latency increases while the throughput remains unchanged. This is a clear sign of a bottleneck in the backend Python application.

 The single-threaded, blocking nature of the application is preventing it from processing more than 50 requests per second. Without further optimization, any attempt to increase concurrency beyond this threshold results in increased queuing time and degraded response times.



 Optimization: Running 10 Python Backend Instances
 We observed that CPU utilization for the backend Python application was less than 5%. Since the web application is blocking, we decided to scale horizontally by running 10 instances of the Python application on different ports using the following shell script:

 bash
 Copy code
 #!/bin/bash

 # Start 10 instances of web_application.py on different ports
 for i in {1..10}; do
  port=$((8000 + i - 1))
  python3 web_application.py $port >> "server_$port.log" 2>&1 &
  echo $! >> pids.txt  # Save the PID of each process to pids.txt
 done

 echo "Started 10 Python web applications."
 We updated the Nginx configuration to distribute requests among the 10 backend instances using round-robin load balancing:

 nginx
 Copy code
 upstream backend {
    server 127.0.0.1:8000;
    server 127.0.0.1:8001;
    server 127.0.0.1:8002;
    server 127.0.0.1:8003;
    server 127.0.0.1:8004;
    server 127.0.0.1:8005;
    server 127.0.0.1:8006;
    server 127.0.0.1:8007;
    server 127.0.0.1:8008;
    server 127.0.0.1:8009;
 }
 This allowed us to handle 10 concurrent requests simultaneously across the Python instances, significantly improving performance.

 Experiment 4: 10 Concurrent Users with 10 Backend Instances
 With the new setup, we repeated the load test with 10 concurrent users:

 Requests per second: ~200 (or 12,000 requests per minute)
 Total requests: 12,000
 p50 response time: 50 ms
 p60 response time: 60 ms
 Experiment 5: 50 Concurrent Users
 Next, we tested the system with 50 concurrent users, and the results were as follows:

 Requests per second: ~470 (or 28,000 requests per minute)
 Total requests: 28,000
 p50 response time: 100 ms
 p95 response time: 110 ms
 Experiment 6: 100 Concurrent Users
 Finally, we tested the system with 100 concurrent users. The results:

 Requests per second: ~480 (or 29,000 requests per minute)
 Total requests: ~29,000
 p50 response time: Varied from 50 ms to 150 ms throughout the test
 p95 response time: Ranged from 175 ms to 350 ms
 Key Observations:
 Throughput Limit with 1 Backend Instance: In the initial setup, with only one Python backend application, the system was able to handle a maximum throughput of 50 requests per second. This was the processing limit of a single instance.

 10x Throughput Increase: After scaling horizontally by running 10 backend instances, we observed a 10x increase in request throughput, reaching up to 500 requests per second. This demonstrates the clear benefit of distributing the load across multiple Python instances.

 Latency Improvements: Initially, with 10 concurrent users, the response time ranged from 50 ms to 150 ms. After scaling the system to 100 concurrent users with 10 backend instances, the response time remained within the same range—50 ms to 150 ms. This illustrates the system’s capacity to handle 10x more users without any significant impact on latency or response time.

 Further Scaling and CPU Usage: Interestingly, even with 10 Python backend applications, the CPU was not fully saturated. As a result, we increased the number of backend instances to 30 Python web applications and modified Nginx's load-balancing configuration accordingly. This allowed us to further scale the system’s throughput.

 Throughput Surpassed 1000 Requests Per Second: With 30 backend instances, the system was able to handle over 1000 requests per second, demonstrating a 20x throughput increase from the initial 50 requests per second limit. This shows the server’s capacity to scale horizontally while maintaining acceptable performance levels.

 Conclusions
 Concurrency Scaling: The system successfully scaled 10x in terms of handling concurrent users, from handling 1 user initially to 100 users, while keeping response times consistent and within acceptable ranges.

 Throughput Scaling: The system’s throughput scaled 10x, from 50 requests per second with 1 backend instance to 500 requests per second with 10 backend instances. After further scaling to 30 instances, the throughput exceeded 1000 requests per second, a 20x increase from the initial setup.

 Latency Maintenance: Even with a 10x increase in concurrent users, the system was able to maintain similar latency (p50, p95) ranges, indicating excellent horizontal scaling without compromising response time.

 Modest Server Capacity: This experiment highlights that a modest server configuration with 1 GB RAM and a single-core CPU can be optimized to handle significant traffic loads by leveraging proper horizontal scaling.

 Horizontal Scaling Efficiency: The experiment confirms that horizontal scaling (running multiple backend instances) can significantly increase request processing capacity without impacting response times, as long as resource usage like CPU remains within limits.

 Next Steps
 Further Scaling: Explore testing with more concurrent users and backend instances to identify the next bottleneck (such as network, disk I/O, or CPU).

 Backend Optimization: Consider migrating to an asynchronous framework (e.g., FastAPI) or using asynchronous workers (like gevent in Flask) to further improve throughput and resource efficiency.
	Optimizing Nginx and Python Configuration with Load Testing: A Case Study
	Objective
	The goal of this experiment was to determine how many concurrent users and requests per second a modest server configuration could handle with a basic Nginx and Python web application setup. The server in question has 1 GB RAM and a 2000 MHz single-core CPU, hosted on Linode.

	Initial Setup
	We started with a high-performance Nginx server capable of handling 10,000 requests per second for static content. For dynamic content, we configured Nginx to route requests to a Python web application. The web application simulates both CPU-bound tasks and a database operation with a total processing time of approximately 20 milliseconds per request, giving the backend the ability to serve at most 50 requests per second.

	The backend Python application was served by a single instance initially, and Nginx was configured to proxy requests to this backend:

	nginx
	Copy code
	upstream backend {
	server 127.0.0.1:8000;
	}

	location /py {
	proxy_pass http://backend;
	}
	Experiment 1: 1 Concurrent User
	We used Locust as our load testing tool to simulate concurrent users making requests. With 1 concurrent user, we observed:

	Requests per second: ~20
	Total requests: ~1200 in 1 minute
	p50 response time: 50 ms
	p75 response time: 60 ms
	While the Python web application processes each request in around 20 ms, the average observed latency (~50 ms) was likely due to network round-trip time and other overheads.

	Experiment 2: 5 Concurrent Users
	Next, we increased the number of concurrent users to 5. Results:

	Requests per second: ~45
	Total requests: ~2700
	p50 response time: 100 ms
	p95 response time: 110 ms
	The increased latency is expected due to queuing, as multiple requests must wait for the single-threaded Python web application to process them. However, we observe that the request throughput is not increasing significantly despite the higher concurrency.

	Experiment 3: 10 Concurrent Users
	Increasing the number of concurrent users to 10 yielded:

	Requests per second: ~47
	Total requests: ~2900
	p50 response time: 170 ms
	p95 response time: 180 ms
	At this point, the latency starts to increase more significantly due to the queue buildup, while the request throughput remains roughly the same (~45-50 requests per second). This highlights an important limitation: No matter how much we increase concurrency, the system cannot move past 50 requests per second, as it hits the backend’s capacity.

	Key Observation: Throughput Bottleneck
	Despite increasing the number of concurrent users, the requests per second plateaued around 45-50. As the concurrency rises, the latency increases while the throughput remains unchanged. This is a clear sign of a bottleneck in the backend Python application.

	The single-threaded, blocking nature of the application is preventing it from processing more than 50 requests per second. Without further optimization, any attempt to increase concurrency beyond this threshold results in increased queuing time and degraded response times.



	Optimization: Running 10 Python Backend Instances
	We observed that CPU utilization for the backend Python application was less than 5%. Since the web application is blocking, we decided to scale horizontally by running 10 instances of the Python application on different ports using the following shell script:

	bash
	Copy code
	#!/bin/bash

	# Start 10 instances of web_application.py on different ports
	for i in {1..10}; do
	port=$((8000 + i - 1))
	python3 web_application.py $port >> "server_$port.log" 2>&1 &
	echo $! >> pids.txt # Save the PID of each process to pids.txt
	done

	echo "Started 10 Python web applications."
	We updated the Nginx configuration to distribute requests among the 10 backend instances using round-robin load balancing:

	nginx
	Copy code
	upstream backend {
	server 127.0.0.1:8000;
	server 127.0.0.1:8001;
	server 127.0.0.1:8002;
	server 127.0.0.1:8003;
	server 127.0.0.1:8004;
	server 127.0.0.1:8005;
	server 127.0.0.1:8006;
	server 127.0.0.1:8007;
	server 127.0.0.1:8008;
	server 127.0.0.1:8009;
	}
	This allowed us to handle 10 concurrent requests simultaneously across the Python instances, significantly improving performance.

	Experiment 4: 10 Concurrent Users with 10 Backend Instances
	With the new setup, we repeated the load test with 10 concurrent users:

	Requests per second: ~200 (or 12,000 requests per minute)
	Total requests: 12,000
	p50 response time: 50 ms
	p60 response time: 60 ms
	Experiment 5: 50 Concurrent Users
	Next, we tested the system with 50 concurrent users, and the results were as follows:

	Requests per second: ~470 (or 28,000 requests per minute)
	Total requests: 28,000
	p50 response time: 100 ms
	p95 response time: 110 ms
	Experiment 6: 100 Concurrent Users
	Finally, we tested the system with 100 concurrent users. The results:

	Requests per second: ~480 (or 29,000 requests per minute)
	Total requests: ~29,000
	p50 response time: Varied from 50 ms to 150 ms throughout the test
	p95 response time: Ranged from 175 ms to 350 ms
	Key Observations:
	Throughput Limit with 1 Backend Instance: In the initial setup, with only one Python backend application, the system was able to handle a maximum throughput of 50 requests per second. This was the processing limit of a single instance.

	10x Throughput Increase: After scaling horizontally by running 10 backend instances, we observed a 10x increase in request throughput, reaching up to 500 requests per second. This demonstrates the clear benefit of distributing the load across multiple Python instances.

	Latency Improvements: Initially, with 10 concurrent users, the response time ranged from 50 ms to 150 ms. After scaling the system to 100 concurrent users with 10 backend instances, the response time remained within the same range—50 ms to 150 ms. This illustrates the system’s capacity to handle 10x more users without any significant impact on latency or response time.

	Further Scaling and CPU Usage: Interestingly, even with 10 Python backend applications, the CPU was not fully saturated. As a result, we increased the number of backend instances to 30 Python web applications and modified Nginx's load-balancing configuration accordingly. This allowed us to further scale the system’s throughput.

	Throughput Surpassed 1000 Requests Per Second: With 30 backend instances, the system was able to handle over 1000 requests per second, demonstrating a 20x throughput increase from the initial 50 requests per second limit. This shows the server’s capacity to scale horizontally while maintaining acceptable performance levels.

	Conclusions
	Concurrency Scaling: The system successfully scaled 10x in terms of handling concurrent users, from handling 1 user initially to 100 users, while keeping response times consistent and within acceptable ranges.

	Throughput Scaling: The system’s throughput scaled 10x, from 50 requests per second with 1 backend instance to 500 requests per second with 10 backend instances. After further scaling to 30 instances, the throughput exceeded 1000 requests per second, a 20x increase from the initial setup.

	Latency Maintenance: Even with a 10x increase in concurrent users, the system was able to maintain similar latency (p50, p95) ranges, indicating excellent horizontal scaling without compromising response time.

	Modest Server Capacity: This experiment highlights that a modest server configuration with 1 GB RAM and a single-core CPU can be optimized to handle significant traffic loads by leveraging proper horizontal scaling.

	Horizontal Scaling Efficiency: The experiment confirms that horizontal scaling (running multiple backend instances) can significantly increase request processing capacity without impacting response times, as long as resource usage like CPU remains within limits.

	Next Steps
	Further Scaling: Explore testing with more concurrent users and backend instances to identify the next bottleneck (such as network, disk I/O, or CPU).

	Backend Optimization: Consider migrating to an asynchronous framework (e.g., FastAPI) or using asynchronous workers (like gevent in Flask) to further improve throughput and resource efficiency.