Handling a high volume of concurrent requests in a Django application with Celery for background tasks can be challenging. This guide will walk you through the necessary steps to optimize your setup for better performance and scalability.
By default, Gunicorn with Django and Celery uses synchronous workers to handle web requests and background tasks. This means:
- Gunicorn: Uses sync workers which can handle one request at a time per worker.
- Celery: Processes tasks synchronously within each worker.
- Blocking Operations: Sync workers block on I/O operations, leading to inefficient resource utilization.
- Scalability: Limited by the number of sync workers; adding more workers increases memory usage significantly.
- Database Connections: Each worker maintains its own database connections, which can quickly exhaust the database connection pool under heavy load.
To make your application asynchronous, you can use Gevent with Gunicorn and Celery.
Gevent provides a coroutine-based Python networking library that uses greenlets to handle concurrent operations.
-
Install Gevent:
pip install gevent
-
Update your Gunicorn command to use Gevent workers:
gunicorn config.wsgi:application -w 4 -k gevent
- Update your Celery worker command to use Gevent:
celery -A config worker --pool=gevent --concurrency=50
However, Gevent cannot monkey patch the PostgreSQL library (psycopg2
or psycopg2-binary
). This limitation requires an additional step.
Psycogreen is a library that enables async I/O for psycopg2
using greenlet-based libraries like Gevent.
-
Install Psycogreen:
pip install psycogreen
-
Monkey patch
psycopg2
orpsycopg2-binary
in yourconfig/__init__.py
file:from psycogreen.gevent import patch_psycopg patch_psycopg()
Note that
patch_psycopg
should be called before importing any other library.
Even with Gevent and Psycogreen, Django creates too many database connections. PgBouncer, a lightweight connection pooler for PostgreSQL, helps manage this efficiently.
- Install PgBouncer and configure it to pool connections to your PostgreSQL database.
- Update your database settings in
config/settings.py
to use PgBouncer:DATABASES = { 'default': { 'ENGINE': 'django.db.backends.postgresql', 'NAME': 'your_db_name', 'USER': 'your_db_user', 'PASSWORD': 'your_db_password', 'HOST': '127.0.0.1', 'PORT': '6432', # PgBouncer port } }
To reduce the number of open connections, ensure Django and Celery close connections after each request and task.
Create a some_app/signals.py
file with the following content:
from django.core import signals
from django.db import connections
def force_close_old_connections(**kwargs):
for conn in connections.all(initialized_only=True):
conn.close()
signals.request_finished.connect(force_close_old_connections)
Create a config/celery_fixups.py
file with the following content:
from __future__ import annotations
from celery.fixups.django import DjangoFixup, DjangoWorkerFixup
class CustomDjangoWorkerFixup(DjangoWorkerFixup):
def close_database(self, **kwargs) -> None:
if not self.db_reuse_max:
return self._close_database(force=True)
if self._db_recycles >= self.db_reuse_max * 2:
self._db_recycles = 0
self._close_database()
self._db_recycles += 1
class CustomDjangoFixup(DjangoFixup):
@property
def worker_fixup(self):
if self._worker_fixup is None:
self._worker_fixup = CustomDjangoWorkerFixup(self.app)
return self._worker_fixup
def custom_fixup(app, env='DJANGO_SETTINGS_MODULE'):
return CustomDjangoFixup(app).install()
Update config/celery.py
to use the custom fixup:
import os
from celery import Celery
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings')
Celery.builtin_fixups = {
'config.celery_fixups:custom_fixup',
}
app = Celery('app')
# rest of the code here
To ensure your Django and Celery setup is working as intended, follow these steps to test both Gunicorn and Celery with a simulated slow query.
-
Create a Test API View
Create an API view that executes a slow query using
pg_sleep(5)
to simulate a 5-second delay:from django.http import JsonResponse from django.db import connection def slow_query_view(request): with connection.cursor() as cursor: cursor.execute("SELECT pg_sleep(5);") return JsonResponse({'message': 'Slow query completed'})
-
Run Gunicorn with One Gevent Worker for Testing
Run your Django application using Gunicorn with one Gevent worker:
gunicorn config.wsgi:application -w 1 -k gevent
-
Send Concurrent Requests
Use a tool like Apache Benchmark (ab) to send concurrent requests to your API endpoint:
ab -n 10 -c 10 http://127.0.0.1:8000/slow-query/
This command sends 10 requests with a concurrency level of 10 to the
slow-query
endpoint. -
Analyze the Response Time
Check the output of the
ab
command to ensure that each response is delivered in about 5 seconds. The total time for 10 concurrent requests should be close to 5 seconds, indicating that the Gevent worker is handling multiple requests concurrently.
-
Create a Test Task
Create a Celery task that executes a slow query using
pg_sleep(5)
:from celery import shared_task from django.db import connection @shared_task def slow_query_task(): with connection.cursor() as cursor: cursor.execute("SELECT pg_sleep(5);") return 'Slow query completed'
-
Run Celery with One Gevent Worker and High Concurrency
Run Celery with one Gevent worker and a high concurrency level:
celery -A config worker --pool=gevent --concurrency=50
-
Send Many Tasks
Send a large number of tasks to Celery to ensure it handles them efficiently. You can do this in a Django management command or directly in a Django shell:
from your_app.tasks import slow_query_task for _ in range(100): slow_query_task.delay()
-
Check Task Processing Time
Monitor the Celery worker logs to ensure that tasks are processed concurrently and that each task takes about 5 seconds to complete. The total time for processing all tasks should be significantly less than 500 seconds, demonstrating that Celery is handling tasks concurrently with Gevent.
In this guide, we optimized a Django and Celery setup to handle many concurrent requests by:
- Using Gevent: For asynchronous request handling with Gunicorn and Celery.
- Using Psycogreen: To enable async I/O for PostgreSQL with Gevent.
- Setting up PgBouncer: To manage database connections efficiently.
- Closing Connections: Ensuring Django and Celery close database connections after requests and tasks.
Each step addresses specific limitations, ensuring a scalable and efficient application capable of handling high concurrency.