-
cacheops watches Django make database queries, and caches the resultsets
-
When a record is created or updated, cacheops invalidates all the cached resultsets that might have included that record
cacheops operates on the Django ORM's QuerySet objects. When a QuerySet is executed, cacheops analyzes its where
structure to build and cache a simplified list of the fields that were filtered on. Then after it caches the resultset itself, it adds the resultset's cache key to a list of queries that used those filter fields with the same values.
When a record is saved, cacheops looks through these lists to find the queries that it thinks the record might have matched, and deletes all the resultsets that have been cached for all of those queries. (There's an article by cacheops' author here that explains the concepts behind this process.)
Both these operations – updating the invalidation metadata, and looping over it to delete cached resultsets – are done by calling Lua scripts that Redis runs internally, like SQL stored procedures.
Rover's flexible filtering lets clients create any query they can imagine. But cacheops' idea of what queries "might" match a changed record is pretty crude – it doesn't count any filters on joined tables, and any filters on TextField
s, and any filters with an IN
list longer than 8, and any case-insensitive filters, and others.
So when it goes looking for queries to invalidate, it finds lots that "match" the saved record. And those queries have lots of cached resultsets – thousands! – so the resulting Redis calls can take multiple seconds to enumerate and delete them all.
Which doesn't sound that bad – but Redis is single-threaded. While it's running the invalidation script, it can't handle any other requests. Which leads to the request-processing delays we're seeing.
Done, and it seems to be helping. But we'll catch up if we don't change how we use the cache.
This would let us scale out easily ahead of growing traffic. cacheops invalidation doesn't currently work in a sharded cluster – see this GitHub issue that has been open for years – but maybe we can shard by table as Dennis has suggested.
It's a year old, and there may be relevant fixes in upstream. Ideally, we'll find that our customizations are all available there now and we can just use it instead.
There are some issues with cacheops' query analysis that affect us – for example, it skips TextField
s on the assumption that they are large and shouldn't be compared, which isn't true on Postgres; and it skips case-insensitive matches which we (ugh) use by default. Fixing those would make the invalidation faster since it'd delete fewer cached resultsets, and it'd leave the valid resultsets in the cache, reducing cache misses.
Dennis has found some optimizations in the Lua scripts.
The Lua script is taking too long to list and delete the invalid resultset caches – and while it's doing that Redis can't handle other requests. We could do that in Python, which would be even slower, but Redis would be able to handle other work in between commands.
Once we're handling invalidation in Python, we could queue those tasks for asynchronous processing using RQ, which is already running to handle Bark tasks. I don't know if this would be fast enough to beat Edit UI save/load rendering though.