Skip to content

Instantly share code, notes, and snippets.

@hrisheekeshr
Created February 27, 2025 18:09
Show Gist options
  • Save hrisheekeshr/fc9ec4a6761028b3d75e872853b48db7 to your computer and use it in GitHub Desktop.
Save hrisheekeshr/fc9ec4a6761028b3d75e872853b48db7 to your computer and use it in GitHub Desktop.

You can configure Alembic to run migrations on two separate PostgreSQL databases in a single alembic upgrade head command by modifying the env.py script. Here's the approach:

  1. Modify env.py to Manage Multiple Engines:

    • Create two separate SQLAlchemy engines (one for each database) inside the env.py script.
    • Ensure that migrations are applied to both databases.
  2. Use run_migrations_online for Both Databases:

    • Instead of a single engine, define and use two engines.
    • Use context.run_migrations() for each engine.
  3. Example env.py Configuration:

    • Modify env.py to include both engines and execute migrations sequentially.
    • Handle the async engine properly for the first database.
  4. Execute in Your CI/CD Pipeline:

    • Run alembic upgrade head as usual, ensuring both databases are migrated.

I'll prepare a detailed step-by-step guide with exact code snippets and considerations for async engines. I'll notify you once it's ready.

Running Alembic Migrations on Two PostgreSQL Databases with One Command

When your application uses two separate PostgreSQL databases (for example, a primary DB and a secondary DB) and one of them uses SQLAlchemy's async engine, you can configure Alembic to apply migrations to both in one go. Instead of running alembic upgrade head twice (once per database), you can customize Alembic’s configuration to run migrations on both databases sequentially with a single command. This guide explains how to modify alembic.ini and env.py to achieve that, provides code snippets, and discusses potential pitfalls (like versioning conflicts and transaction handling) to consider. The solution is designed to integrate smoothly into CI/CD pipelines without changing your pipeline scripts.

1. Alembic Configuration for Multiple Databases (alembic.ini)

First, update alembic.ini to define both database connections and migration file locations. Alembic supports multiple config sections in one ini file. We will use a main section to list the databases, and separate sections for each database’s connection URL (and optional version directory). For example:

[alembic]
script_location = %(here)s/migrations
databases = db_main, db_async

[db_main]
sqlalchemy.url = postgresql://user:password@localhost:5432/main_db
version_locations = %(here)s/migrations/db_main  ; migrations for the main DB

[db_async]
sqlalchemy.url = postgresql+asyncpg://user:password@localhost:5432/async_db
version_locations = %(here)s/migrations/db_async  ; migrations for the async DB

In this setup:

  • The databases option under [alembic] lists the database keys (here db_main and db_async) (Re: [sqlalchemy] Does alembic support multiple databases?). This is a comma-separated list that we will parse in env.py.
  • Each database has its own section ([db_main] and [db_async]) with its sqlalchemy.url. The second database uses postgresql+asyncpg to indicate an async driver.
  • We specify version_locations for each database, pointing to separate sub-directories for migration scripts. This ensures each database maintains its own migration history and version files (Managing Alembic Migrations with a single alembic.ini & env.py – LearningToTest). (These directories should exist within the script_location path.)

Why separate migration folders? Keeping migrations for each DB in distinct directories prevents revision ID conflicts and keeps schema changes isolated per database. Alembic’s multi-database template by default would use a single alembic_version table to track all databases, which is undesirable (Managing Alembic Migrations with a single alembic.ini & env.py – LearningToTest) – we want each database to track its own migrations independently. By using separate version_locations, each database will have its own alembic_version table and revision files, avoiding cross-database version conflicts.

Note: You can still use environment variables in the URLs (as shown above with placeholders or using ${ENV_VAR} syntax) if credentials differ per environment. Just ensure you populate them in env.py before creating the engines.

2. Modifying env.py to Use Two Database Engines

Next, customize env.py so that Alembic connects to both databases and runs their migrations in sequence. We will create two SQLAlchemy engine objects (one standard and one async) and instruct Alembic’s migration context to run each set of migrations. The key steps are:

  • Read the database configurations: Use context.config to get the list of database names and their connection URLs from alembic.ini. For each name in the databases list, load its config section (Re: [sqlalchemy] Does alembic support multiple databases?).
  • Create two engines: For each DB, create a SQLAlchemy engine. Use engine_from_config for the standard (sync) engine, and create_async_engine (or an AsyncEngine wrapper) for the async DB engine.
  • Prepare target metadata: If each database has a different set of models (likely the case), prepare separate MetaData objects or Base.metadata for each. For example, you might have metadata_main and metadata_async variables imported from your model definitions. This ensures Alembic knows which tables belong to which database when auto-generating migrations (python - Using alembic with multiple databases - Stack Overflow).
  • Run migrations for each engine: Loop through the engines. For each one, configure Alembic’s context with the connection and the appropriate metadata, then call context.run_migrations(). We’ll handle the async engine using asyncio.run and Alembic’s connection.run_sync() helper.

Below is a simplified env.py example illustrating these steps:

from logging.config import fileConfig
from sqlalchemy import engine_from_config, pool
from sqlalchemy.ext.asyncio import create_async_engine, AsyncEngine
from alembic import context
# import your metadata objects for each DB
from myapp.db_main.models import Base as BaseMain
from myapp.db_async.models import Base as BaseAsync

# Interpret the config file for Python logging.
config = context.config
fileConfig(config.config_file_name)

# Prepare metadata mapping for each database
target_metadata = {
    "db_main": BaseMain.metadata,
    "db_async": BaseAsync.metadata
}

def run_migrations_for_engine(name: str, engine) -> None:
    """Run Alembic migrations for a given engine (sync or async)."""
    # Define a helper to run migrations given a sync connection
    def do_run_migrations(connection):
        context.configure(connection=connection,
                          target_metadata=target_metadata.get(name))
        with context.begin_transaction():
            context.run_migrations()

    if isinstance(engine, AsyncEngine):
        # Async engine: run migrations in an asyncio event loop
        async def run_async():
            async with engine.connect() as connection:
                await connection.run_sync(do_run_migrations)
            await engine.dispose()
        asyncio.run(run_async())
    else:
        # Synchronous engine: connect and run directly
        with engine.connect() as connection:
            do_run_migrations(connection)
        engine.dispose()

# --- Begin migrations for each database --- #
db_names = config.get_main_option("databases")
for db_name in [name.strip() for name in db_names.split(",")]:
    # Retrieve this DB's config subsection
    section = config.get_section(db_name)
    db_url = section.get("sqlalchemy.url")
    # Create appropriate engine (async vs sync)
    if db_url.startswith("postgresql+asyncpg"):
        engine = create_async_engine(db_url, poolclass=pool.NullPool)
    else:
        engine = engine_from_config(section, prefix="sqlalchemy.", poolclass=pool.NullPool)
    run_migrations_for_engine(db_name, engine)

How this works: We retrieve the comma-separated database names from the config and iterate over them (Re: [sqlalchemy] Does alembic support multiple databases?). For each database, we get its URL and instantiate an engine. If the URL indicates an async driver, we use create_async_engine to get an AsyncEngine; otherwise, we use the standard engine_from_config. In run_migrations_for_engine, we define do_run_migrations to configure the Alembic context and apply migrations within a transaction for a given database connection (Asyncio — Pytest Alembic 0.4.0 documentation). We then check the engine type: if it's async, we open an async connection and use await connection.run_sync(do_run_migrations) to execute the migration logic in a thread-safe manner (Asyncio — Pytest Alembic 0.4.0 documentation). If it's a normal engine, we just connect and call do_run_migrations directly. Each engine’s migrations are run one after the other in the loop.

Important details:

  • We call context.configure() separately for each database connection, passing in the appropriate target_metadata. This ensures Alembic’s autogeneration and migrations are aware of only the models for that specific database (python - Using alembic with multiple databases - Stack Overflow). For example, when running db_main migrations, only tables from BaseMain.metadata are considered.
  • Each database’s migration run is wrapped in its own transaction (context.begin_transaction()), which Alembic uses to ensure all operations for that revision either succeed or fail together on that DB. The helper do_run_migrations shows this pattern clearly (Asyncio — Pytest Alembic 0.4.0 documentation).
  • We dispose of each engine after running its migrations (engine.dispose() or await engine.dispose() for async) to free resources, especially important in long-running processes or when using connection pools.

If you also need to support offline mode (alembic upgrade head --sql), you can extend the script to handle context.is_offline_mode() by calling context.run_migrations() with an appropriate URL for each database. In many cases, if your CI/CD always runs online migrations, you might skip offline mode entirely.

3. Running the Migrations in CI/CD

With the above configuration, you don’t need to change your CI/CD pipeline script at all. Simply running the usual Alembic upgrade command will now apply migrations to both databases:

$ alembic upgrade head

When this command runs, Alembic loads the single env.py which in turn executes migrations for db_main then db_async in sequence. The order can be controlled by the loop order in env.py (in our example, db_main runs first, then db_async). Both sets of migrations run under one Alembic invocation.

Verification: You can test this locally by running alembic upgrade head and checking both databases. Each should have an alembic_version table updated to the latest revision, and all expected schema changes applied.

Because the process is automated inside env.py, your pipeline doesn’t need to run multiple commands or pass any special flags (like --name). This keeps your CI/CD configuration simple and unchanged.

4. Considerations and Potential Issues

Before using this setup in production, be aware of a few important considerations:

  • Independent Revision Histories: Since each database has its own migrations directory, they will maintain independent revision chains. This means you could have two different “heads” (one per DB). Alembic will treat the combination as a multi-head environment. Ensure your revision IDs are unique across both directories to avoid confusion. Alembic’s built-in multi-base support will handle multiple heads if configured with multiple version_locations, but you should never merge these branches because they represent different schemas. Each database’s alembic_version table will track its own head revision (so it’s fine if both have a revision named head, as they’re in different databases). In short, do not reuse the same revision file for both databases – keep them separate to prevent conflicts.

  • Autogenerate Migrations: If you use alembic revision --autogenerate, you may need to run it separately for each database or implement additional logic. Autogenerating for both in one command is tricky – Alembic would need to know which metadata to compare for which output file. A workaround is to run autogeneration twice (with an environment variable or -x option to pick the database) or use advanced hooks to direct revisions into different folders. In our setup, as long as you call context.configure(..., target_metadata=...) correctly per engine, Alembic will detect changes for that database only (python - Using alembic with multiple databases - Stack Overflow). Just be cautious not to mix metadata, otherwise Alembic might miss differences or generate incorrect combined scripts.

  • Transactional Integrity: Alembic (and PostgreSQL) will wrap each migration script in a transaction. In our sequential run, each database’s migrations are applied in a separate transaction. This means if migrations succeed on the first database but fail on the second, the first database’s changes will remain committed while the second rolls back. You won’t have an automatic “all-or-nothing” across both databases – cross-database transactions are not supported. In a CI/CD context, a failure will typically stop the process and report an error. You should be prepared to handle a scenario where one database is migrated and the other is not (for example, by fixing the issue and re-running the migrations for the second database). The upside is that within each database, Alembic’s normal transactional safety applies (if a migration script fails halfway, that database’s changes rollback). To minimize issues, ensure your migrations are tested per database, and consider ordering such that the most critical database runs first.

  • Order of Execution: The sequence in which you run the migrations can matter if there are dependencies between the databases (e.g. foreign data wrappers or cross-database links). Usually, databases are independent, but if you have any such relationship, plan the order accordingly. You can control the order by the order of iteration in env.py. In our example, db_main migrations run before db_async. If you needed the opposite, simply swap their order in the databases list in alembic.ini or in the loop.

  • Async Engine Support: We specifically included support for an async engine (AsyncEngine) in env.py. Alembic’s official guidance is to use asyncio.run() with connection.run_sync() for running async migrations (Asyncio — Pytest Alembic 0.4.0 documentation), which is exactly what our solution does. This approach creates a temporary event loop to run the migrations and then closes it. It’s important not to already have an event loop running (for example, if Alembic is invoked from within an async context) – if you do, you might need a different approach to integrate with that loop. In a typical CLI invocation (as in a CI pipeline), this isn’t an issue.

  • Pipeline Environment: Make sure both database URLs are accessible to the environment where migrations run. If using environment variables for secrets, verify that they are set in the CI/CD pipeline. Also, ensure the migrations/db_main and migrations/db_async directories (from our example) are included in your repository and contain the migration scripts for each database. The Alembic command will need access to these to apply the changes.

By addressing these considerations, you can confidently use a single Alembic command to upgrade multiple databases. This unified approach simplifies deployment processes (one command instead of many) while keeping each database’s schema changes properly isolated and tracked.

5. Summary

Using a single alembic upgrade head for two databases is achievable by leveraging Alembic’s flexibility in env.py. We configured Alembic to load two engine connections (one of them async) and run their migrations one after the other. Key points to recap:

With this setup, you maintain a clean, Pythonic Alembic configuration for multiple databases and streamline your deployment process. You can manage schema migrations for both PostgreSQL databases together while still keeping their histories separate and avoiding conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment