asyncio support for SQLAlchemy (and Flask, and any other blocking-IO library)

This is a cross post of something I just posted on the Python bug tracker at https://bugs.python.org/msg373145.

I seem to have two cents to offer so here it is. An obscure issue in the Python bug tracker is probably not the right place for this so consider this as an early draft of something that maybe I'll talk about more elsewhere.

> This basically divides code into two islands - async and non-async

yes, this is the problem, and at the bottom of this apparently somewhat ranty comment is a solution, and the good news is that it does not require Python or asyncio be modified. My concern is kind of around how it is that everyone has been OK with the current state of affairs for so long, why it is that "asyncio is fundamentally incompatible with library X" is considered to be acceptable, and also how easy it was to find a workaround, this is not something I would have expected to come up with. Kind of like you don't expect to invent Velcro or windshield wipers.

asyncio's approach is what those of us in the library/framework community call "explicit async", you have to mark functions that will be doing IO and the points at which IO occurs must also be marked. Long ago it was via callback functions, then asyncio turned it into decorators and yields, and finally pep492 turned it into async/await, and it is very nicely done. It is of course a feature of asyncio that writing out async/await means your code can in theory be clearer as to where IO occurs and all that, and while I don't totally buy that myself, I'm of course in favor of that style of coding being available, it definitely has its own kind of self-satisfaction built in when you do it. That's all great.

But as those of us in the library/framework community also know, asyncio's approach essentially means, libraries like Flask, Django, my own SQLAlchemy, etc. are all automatically "non-workable" with the asyncio approach; while these libraries can certainly have asyncio endpoints added to them, the task as designed is not that simple, since to go from an asyncio endpoint all the way through library code that doesn't care about async and then down into a networking library that again has asyncio endpoints, the publishing of "async" and the "await" or yield approach must be wired all the way through every function and method. This is all despite that when you're not at the endpoints, the points at which IO occurs is fully predictable such that libraries like gevent don't need you to write it. So we are told that libraries have to have full end-to-end rewrites of all their code to work this way, or otherwise maintain two codebases, or something like that.

The side effect of this is that a whole bunch of library and framework authors now get to create all new libraries and frameworks, which do exactly the same thing as all the existing libraries and frameworks, except they sprinkle the "async/await" keywords throughout middle tiers as required. Vague claims of "framework X is faster because it's async" appear, impossible to confirm as it is unknown how much of their performance gains come from the "async" aspect and how much of it is that they happened to rewrite a new framework from scratch in a completely different way (hint: it's the latter).

Or in other cases, as if to make it obvious how much the "async/await" keywords come down to being more or less boilerplate for the "middle" parts of libraries, the urllib3 project wrote the "unasync" project [1] so that they can simply maintain two separate codebases, one that has "async/await" and the other which just search-and-replaced them out.

SQLAlchemy has not been "replaced" by this trend as asyncio database libraries have not really taken off in Python, and there are very few actual async drivers. Some folks have written SQLAlchemy-async libraries that use SQLAlchemy's expression system while they have done the tedious, redundant and impossible-to-maintain work of replicating enough of SQLAlchemy's execution internals such that a modest "sqlalchemy-like" experience with asyncio can be reproduced. But these libraries are closed out from all of the fixes and improvements that occur to SQLAlchemy itself, as well as that these systems likely target a smaller subset of SQLAlchemy's behaviors and features in any case. They certainly can't get the ORM working as the ORM runs lots of SQL executions internally, all of which would have to propagate their "asyncness" outwards throughout hundreds of functions.

The asyncpg project, one of the few asyncio database drivers that exists, notes in its FAQ "asyncpg uses asynchronous execution model and API, which is fundamentally incompatible with SQLAlchemy" [2], yet we know this is not true because SQLAlchemy works just fine with gevent and eventlet, with no architectural changes at all. Using libraries like SQLAlchemy or Django with a non-blocking IO, event-based model is commonplace. It's the "explicit" part of it that is hard, which is because of how asyncio is designed, without any mediation for code that doesn't publish "async / await" keywords in the middle.

So I finally just sat down to figure out how to use the underlying greenlet library (which we all know as the portable version of "Stackless Python") to bridge the gap between asyncio and blocking-style code, it's about 30 lines and I have SQLAlchemy working with an async front-end to asyncpg DBAPI as can be seen at [3] based on the proof of concept at [4]. I'm actually running the full py.test suite all inside the asyncio event loop and running asyncpg through SQLAlchemy's whole battery of thousands of tests, all of them written in purely blocking style, and there's not any need to add "async / await / yield / etc" anywhere except the very endpoints, that is, where the top function is called, and then down where we call into asyncpg directly, using a function called await_() that works just like the "await" keyword. Just no "async" function declaration.

A day later, someone took the same idea and got Flask to work in an asyncio event loop at [5] [5a]. The general idea of using greenlet in this way is also present at [6], so I won't be patenting this idea today as oremanj can claim prior art.

Using greenlet, there is no need to break out of the asyncio event loop at all, nor does it change the control flow of parallel coroutines within the loop. It uses greenlet's "switch", quite minimally, to bridge the gap between code that does not push out an "async/await" yield and code that does. There are no threadpools, no alternate event loops, no monkeypatching, just a few greenlet.switch() calls in the right spots. A slight performance decrease of about 15%, but in theory one would only be using asyncio if their application is expected to be IO bound in any case (which folks that know me know is another assertion I frequently doubt).

So to sum up, last week, libraries like Flask and SQLAlchemy were "fundamentally incompatible" with asyncio, and this week they are not. What's confusing me is that I'm not that smart and this is something all of the affected libraries should have been doing years ago, and really, while I know this is not going to happen, this should be part of asyncio itself or at least a very standard approach so that nobody has to assume asyncio means "rewrite all your library code".

To add an extra bonus, you can use this greenlet approach to have blocking-style functions right in the middle of your otherwise asyncio application. Which means this also is a potential solution to the "lazy-loading" problem. You have an asyncio app that does lots of asyncio to talk to microservices, but some functions are doing database work and they really would like to just work in a transaction, load some objects and access their attributes without worrying that a SQL statement can't be emitted. This approach makes that possible as well. ORM lazy loading with the asyncpg driver: [7]. Indeed, if you have a PostgreSQL SQLAlchemy application already written in blocking style, you can use this new extension and drop the entire application into the event loop and use the asyncpg driver, not too unlike using gevent except nothing is monkeypatched.

The recipe is simple and so far appears to be very effective. Using greenlet to manipulate the stack is of course "spooky" and I would assume Python devs may propose that this would lead to hard-to-debug conditions. I've used gevent and eventlet for many years and while they do produce some new issues, most of them relate to the fact that they use monkeypatching of existing modules and particularly around low level network drivers like pymysql. The actual stack moving around within business logic doesn't seem to produce any difficult new issues. Using plain asyncio has a lot of novel and confusing failure modes too. Using the little bit of "spookyness" of greenlet IMO is a lot less work than rewriting SQLAlchemy, Django ORM, Flask, urllib3, etc. from scratch and maintaining two codebases though.

[1]	https://pypi.org/project/unasync/

[2]	https://magicstack.github.io/asyncpg/current/faq.html#can-i-use-asyncpg-with-sqlalchemy-orm

[3]	https://gerrit.sqlalchemy.org/c/sqlalchemy/sqlalchemy/+/2071

[4]	https://gist.github.com/zzzeek/4e89ce6226826e7a8df13e1b573ad354

[5]	https://twitter.com/miguelgrinberg/status/1279894131976921088

[5a]	"Add Async Support" pallets/flask#3412 (comment)

[6]	https://github.com/oremanj/greenback

[7]	https://gerrit.sqlalchemy.org/plugins/gitiles/sqlalchemy/sqlalchemy/+/refs/changes/71/2071/10/examples/asyncio/greenlet_orm.py

vsajip/msg373145.rst

Select an option

No results found

Select an option

No results found