Optimizing Django and Celery for Handling Many Concurrent Requests

Handling a high volume of concurrent requests in a Django application with Celery for background tasks can be challenging. This guide will walk you through the necessary steps to optimize your setup for better performance and scalability.

Default Setup with Gunicorn and Celery

By default, Gunicorn with Django and Celery uses synchronous workers to handle web requests and background tasks. This means:

Gunicorn: Uses sync workers which can handle one request at a time per worker.
Celery: Processes tasks synchronously within each worker.

Caveats and Cons

Blocking Operations: Sync workers block on I/O operations, leading to inefficient resource utilization.
Scalability: Limited by the number of sync workers; adding more workers increases memory usage significantly.
Database Connections: Each worker maintains its own database connections, which can quickly exhaust the database connection pool under heavy load.

Using Gevent with Gunicorn and Celery

To make your application asynchronous, you can use Gevent with Gunicorn and Celery.

Gunicorn with Gevent

Gevent provides a coroutine-based Python networking library that uses greenlets to handle concurrent operations.

Install Gevent:
```
pip install gevent
```
Update your Gunicorn command to use Gevent workers:
```
gunicorn config.wsgi:application -w 4 -k gevent
```

Celery with Gevent

Update your Celery worker command to use Gevent:

celery -A config worker --pool=gevent --concurrency=50

However, Gevent cannot monkey patch the PostgreSQL library (psycopg2 or psycopg2-binary). This limitation requires an additional step.

Using Psycogreen for PostgreSQL

Psycogreen is a library that enables async I/O for psycopg2 using greenlet-based libraries like Gevent.

Install Psycogreen:
```
pip install psycogreen
```
Monkey patch psycopg2 or psycopg2-binary in your config/__init__.py file:
```
from psycogreen.gevent import patch_psycopg
patch_psycopg()
```
Note that patch_psycopg should be called before importing any other library.

Setting Up PgBouncer

Even with Gevent and Psycogreen, Django creates too many database connections. PgBouncer, a lightweight connection pooler for PostgreSQL, helps manage this efficiently.

Install PgBouncer and configure it to pool connections to your PostgreSQL database.

Update your database settings in config/settings.py to use PgBouncer:

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'your_db_name',
        'USER': 'your_db_user',
        'PASSWORD': 'your_db_password',
        'HOST': '127.0.0.1',
        'PORT': '6432',  # PgBouncer port
    }
}

Closing Connections After Requests and Tasks

To reduce the number of open connections, ensure Django and Celery close connections after each request and task.

Django Request Connection Handling

Create a some_app/signals.py file with the following content:

from django.core import signals
from django.db import connections

def force_close_old_connections(**kwargs):
    for conn in connections.all(initialized_only=True):
        conn.close()

signals.request_finished.connect(force_close_old_connections)

Celery Task Connection Handling

Create a config/celery_fixups.py file with the following content:

from __future__ import annotations
from celery.fixups.django import DjangoFixup, DjangoWorkerFixup

class CustomDjangoWorkerFixup(DjangoWorkerFixup):
    def close_database(self, **kwargs) -> None:
        if not self.db_reuse_max:
            return self._close_database(force=True)
        if self._db_recycles >= self.db_reuse_max * 2:
            self._db_recycles = 0
            self._close_database()
        self._db_recycles += 1

class CustomDjangoFixup(DjangoFixup):
    @property
    def worker_fixup(self):
        if self._worker_fixup is None:
            self._worker_fixup = CustomDjangoWorkerFixup(self.app)
        return self._worker_fixup

def custom_fixup(app, env='DJANGO_SETTINGS_MODULE'):
    return CustomDjangoFixup(app).install()

Update config/celery.py to use the custom fixup:

import os
from celery import Celery

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'config.settings')

Celery.builtin_fixups = {
    'config.celery_fixups:custom_fixup',
}

app = Celery('app')

# rest of the code here

Testing Your Setup

To ensure your Django and Celery setup is working as intended, follow these steps to test both Gunicorn and Celery with a simulated slow query.

Testing Gunicorn

Create a Test API View

Create an API view that executes a slow query using pg_sleep(5) to simulate a 5-second delay:

from django.http import JsonResponse
from django.db import connection

def slow_query_view(request):
    with connection.cursor() as cursor:
        cursor.execute("SELECT pg_sleep(5);")
    return JsonResponse({'message': 'Slow query completed'})

Run Gunicorn with One Gevent Worker for Testing

Run your Django application using Gunicorn with one Gevent worker:
```
gunicorn config.wsgi:application -w 1 -k gevent
```
Send Concurrent Requests

Use a tool like Apache Benchmark (ab) to send concurrent requests to your API endpoint:
```
ab -n 10 -c 10 http://127.0.0.1:8000/slow-query/
```
This command sends 10 requests with a concurrency level of 10 to the slow-query endpoint.
Analyze the Response Time

Check the output of the ab command to ensure that each response is delivered in about 5 seconds. The total time for 10 concurrent requests should be close to 5 seconds, indicating that the Gevent worker is handling multiple requests concurrently.

Testing Celery

Create a Test Task

Create a Celery task that executes a slow query using pg_sleep(5):

from celery import shared_task
from django.db import connection

@shared_task
def slow_query_task():
    with connection.cursor() as cursor:
        cursor.execute("SELECT pg_sleep(5);")
    return 'Slow query completed'

Run Celery with One Gevent Worker and High Concurrency

Run Celery with one Gevent worker and a high concurrency level:
```
celery -A config worker --pool=gevent --concurrency=50
```
Send Many Tasks

Send a large number of tasks to Celery to ensure it handles them efficiently. You can do this in a Django management command or directly in a Django shell:
```
from your_app.tasks import slow_query_task

for _ in range(100):
    slow_query_task.delay()
```
Check Task Processing Time

Monitor the Celery worker logs to ensure that tasks are processed concurrently and that each task takes about 5 seconds to complete. The total time for processing all tasks should be significantly less than 500 seconds, demonstrating that Celery is handling tasks concurrently with Gevent.

Summary

In this guide, we optimized a Django and Celery setup to handle many concurrent requests by:

Using Gevent: For asynchronous request handling with Gunicorn and Celery.
Using Psycogreen: To enable async I/O for PostgreSQL with Gevent.
Setting up PgBouncer: To manage database connections efficiently.
Closing Connections: Ensuring Django and Celery close database connections after requests and tasks.

Each step addresses specific limitations, ensuring a scalable and efficient application capable of handling high concurrency.

akhushnazarov/fastdjango.md