Skip to content

Instantly share code, notes, and snippets.

@abulka
Forked from kwarrick/fncache.py
Last active April 24, 2024 11:03
Show Gist options
  • Save abulka/6ab5b2afc5d1adda6f08126a617dd02a to your computer and use it in GitHub Desktop.
Save abulka/6ab5b2afc5d1adda6f08126a617dd02a to your computer and use it in GitHub Desktop.
Redis-backed LRU cache decorator in Python.
#!/usr/bin/env python
__author__ = 'Kevin Warrick'
__email__ = '[email protected], [email protected]'
__version__ = '2.0.0'
import pickle
from collections import namedtuple
from functools import wraps
import inspect
from icecream import ic
ALLOW_NON_REDIS_CACHING = False
_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])
_CacheInfoVerbose = namedtuple("CacheInfoVerbose", ["hits", "misses", "maxsize", "currsize", "paramsignatures"])
def redis_lru(maxsize=None, slice=slice(None), conn=None, optimisekwargs=True):
"""
Simple Redis-based LRU cache decorator *.
*conn* Redis connection
*maxsize* maximum number of entries in LRU cache
*slice* slice object for restricting prototype args
*optimisekwargs* convert all parameter signatures into kwargs dict so only one cache
entry needed for semantically equiv. calls
(recommended, default is True)
Original blog post
https://blog.warrick.io/2012/12/09/redis-lru-cache-decorator-in-python.html
Usage is as simple as prepending the decorator to a function,
passing a Redis connection object, and the desired capacity
of your cache.
@redis_lru(maxsize=10000)
def func(foo, bar):
# some expensive operation
return baz
func.init(redis.StrictRedis())
Uses 4 Redis keys, all suffixed with the function name:
lru:keys: - sorted set, stores hash keys
lru:vals: - hash, stores function output values
lru:hits: - string, stores hit counter
lru:miss: - string, stores miss counter
* Functions prototypes must be serializable equivalent!
Python 3 port and enhancements by Andy Bulka, [email protected], June 2021
-------------------------------------------------------------------------
- Python 3 compatibility
- redis-py 3.0 compatibile
- Made some things more like Python 3 functools.lru_cache
- renamed .clear() to .cache_clear()
- renamed .info() to .cache_info()
- .cache_info() now returns namedtuple object like Python 3
functools.lru_cache does
- renamed redis_lru 'capacity' parameter to 'maxsize', allow it to be
None
- Enable passing in `conn` via the decorator
- Added version number to source code
- Raise exception if redis_lru function has no redis connection
- Added cache_clear_entry() method
- Added verbose flag for cache_info()
* Granular cache clearing with cache_clear_entry():
Whilst cache_clear() clears all cache entries for the function, cache_clear_entry() is
more granular, only clearing the particular cache entry matching the parameters passed to
cache_clear_entry(). E.g.
f(1) f(2) f.cache_clear() - all caches are lost
f(1) f(2) f.cache_clear_entry(1) - cache for f(1) deleted, but f(2)
still cached
See https://stackoverflow.com/questions/56413413/lru-cache-is-it-possible-to-clear-only-a-specific-call-from-the-cache
- Advanced discussion on the use of cache_clear_entry()
If you have gone against the recommended default, and passed
optimisekwargs=False to the decorator, please use cache_clear_entry() with great care
since whilst e.g. `f(1)` and `f(param=1)` mean the same, the lru caching system will
cache those two calls as separate entries. Then when you invalidate one style of call
with `f.cache_clear_entry(1)` this leaves the other style of call `f(param=1)` still
cached and returning stale values - even though semantically these two styles of call are
the same. So if you do use cache_clear_entry() in this way, make sure you call it
repeatedly for all possible parameter signatures that you might have used e.g.
`f.cache_clear_entry(1)`; `f.cache_clear_entry(param=1)`.
On the other hand if you have kept the default optimisekwargs=True on the decorator then
you you don't need to worry about any of this - simply call either `f.cache_clear_entry(
1)` or `f.cache_clear_entry(param=1)` and since they are semantically equivalent,
clearing either one will clear the cache for both - because the same cache entry
is used for both function parameter signature - made possible by 'normalising' all calls
into a sorted, single kwarg dict and not using positional parameters at all, meaning the
same cache entry is calculated for all semantically equivalent calls - nice.
* Enhanced verbose flag for cache_info():
You may now pass 'verbose' to cache_info e.g. `cache_info(verbose=True)` which returns a
namedtuple with one additional member "paramsignatures" e.g. `["hits", "misses",
"maxsize", "currsize", "paramsignatures"]`. The additional item "paramsignatures" in the
tuple is a list of all the active parameter signatures being cached for this particular
function.
- Debugging and watching "paramsignatures" using cache_info()
Invalidating a particular parameter signature using the enhanced cache_clear_entry(
...) with those parameters will delete that parameter signature from this list of tuples.
If you are using the default and recommended optimisekwargs=True on the decorator then
all tuples returned by cache_info(verbose=True) will be a kwarg dictionary converted
into a sorted list of tuples, with no positional parameters e.g.
(('bar', 2), ('baz', 'A'), ('foo', 1))
If you are for some reason using optimisekwargs=False on the decorator then
E.g. If you called f.cache_info(verbose=True) and got "paramsignatures" as two signatures
[(1, 2, 'A'), (1, 2, ('baz', 'A'))] then calling f.cache_clear_entry(1, 2, 'A') then
calling f.cache_info(verbose=True) you will see that "paramsignatures" will now be
reported as merely one signature [(1, 2, ('baz', 'A'))]
Tests:
- Added asserts to the tests
- test a second function
- test maxsize of None
- test maxsize of 1 and ensure cache ejection works
- additional tests
Tips:
- Always call `somefunc.init(conn)` with the redis connection otherwise
your function won't cache. Or pass `conn` in via the decorator (new feature in v1.4).
- Call somefunc.cache_clear() at the start of your tests, since cached
results are permanently in redis
Example Usage:
from fncache import redis_lru as lru_cache
from redis_my_module import conn
@lru_cache(maxsize=None, conn=conn)
def getProjectIds(user) -> List[str]:
return 1
# Later somewhere else
getProjectIds.cache_clear()
"""
if maxsize is None:
maxsize = 5000
def decorator(func):
cache_keys = "lru:keys:%s" % (func.__name__,)
cache_vals = "lru:vals:%s" % (func.__name__,)
cache_hits = "lru:hits:%s" % (func.__name__,)
cache_miss = "lru:miss:%s" % (func.__name__,)
lvars = [None] # closure mutable
def add(key, value):
eject()
conn = lvars[0]
conn.incr(cache_miss)
conn.hset(cache_vals, key, pickle.dumps(value))
"""
Python 3, redis-py 3.0 fix
zadd() - Set any number of element-name, score pairs to the key ``name``. Pairs
are specified as a dict of element-names keys to score values. The score values should
be the string representation of a double precision floating point number.
redis-py 3.0 has changed these three commands to all accept a single positional
argument named mapping that is expected to be a dict. For MSET and MSETNX, the
dict is a mapping of key-names -> values. For ZADD, the dict is a mapping of
element-names -> score. https://pypi.org/project/redis/
"""
# conn.zadd(cache_keys, 0, key) # Original Python 2
conn.zadd(cache_keys, {key: 0.0})
return value
def get(key):
conn = lvars[0]
value = conn.hget(cache_vals, key)
if value:
conn.incr(cache_hits)
"""
Python 3, redis-py 3.0 fix
All 2.X users that rely on ZINCRBY must swap the order of amount and value for the
command to continue to work as intended. https://pypi.org/project/redis/
"""
# conn.zincrby(cache_keys, key, 1.0) # Original Python 2
conn.zincrby(cache_keys, 1.0, key)
value = pickle.loads(value)
return value
def eject():
conn = lvars[0]
"""
In python 2.7, the / operator is integer division if inputs are integers.
In python 3 Integer division is achieved by using //
"""
# count = min((maxsize / 10) or 1, 1000) # Original Python 2
count = min((maxsize // 10) or 1, 1000)
if conn.zcard(cache_keys) >= maxsize:
eject = conn.zrange(cache_keys, 0, count)
conn.zremrangebyrank(cache_keys, 0, count)
conn.hdel(cache_vals, *eject)
@wraps(func)
def wrapper(*args, **kwargs):
conn = lvars[0]
if conn:
if optimisekwargs:
# converts args and kwargs into just kwargs, taking into account default params
kwargs = inspect.getcallargs(func, *args, **kwargs)
items = tuple(sorted(kwargs.items()))
else:
items = args + tuple(sorted(kwargs.items()))
key = pickle.dumps(items[slice])
if optimisekwargs:
return get(key) or add(key, func(**kwargs))
else:
return get(key) or add(key, func(*args, **kwargs))
else:
if ALLOW_NON_REDIS_CACHING:
return func(*args, **kwargs) # Original behaviour (deprecated)
else:
raise RuntimeWarning(f"redis_lru - no redis connection has been supplied "
f"for caching calls to '{func.__name__}'")
def cache_info(verbose=False):
conn = lvars[0]
size = int(conn.zcard(cache_keys) or 0)
hits, misses = int(conn.get(cache_hits) or 0), int(conn.get(cache_miss) or 0)
if verbose:
paramsignatures = conn.zrange(cache_keys, 0, 9999)
return _CacheInfoVerbose(hits, misses, maxsize, size, [pickle.loads(sig) for sig in paramsignatures])
# return hits, misses, capacity, size # Original Python 2
return _CacheInfo(hits, misses, maxsize, size)
def cache_clear(*args, **kwargs):
# traditional behaviour of invalidating the entire cache for this decorated function,
# for all parameter signatures - same as Python 3 functools.lru_cache
conn = lvars[0]
if conn:
conn.delete(cache_keys, cache_vals)
conn.delete(cache_hits, cache_miss)
def cache_clear_entry(*args, **kwargs):
# invalidate only the cache entry matching the parameter signature passed into this
# method - very fancy new granular clear functionality ;-) By default, also invalidates
# all semantically equivalent parameter signatures.
conn = lvars[0]
if conn:
if optimisekwargs:
kwargs = inspect.getcallargs(func, *args, **kwargs)
items = tuple(sorted(kwargs.items()))
else:
items = args + tuple(sorted(kwargs.items()))
key = pickle.dumps(items[slice])
conn.hdel(cache_vals, key) # remove cached return value from hash
conn.zrem(cache_keys, key) # remove param score stuff from sorted set
def init(conn):
lvars[0] = conn
if conn:
init(conn)
wrapper.init = init
wrapper.cache_info = cache_info
wrapper.cache_clear = cache_clear
wrapper.cache_clear_entry = cache_clear_entry
return wrapper
return decorator
if __name__ == "__main__":
import redis
conn = redis.StrictRedis()
"""Initial tests"""
num_actual_calls_made = 0
@redis_lru(maxsize=10)
def test(foo, bar, baz=None):
print('called test.')
global num_actual_calls_made
num_actual_calls_made += 1
return True
conn = redis.StrictRedis()
test.init(conn)
test.cache_clear()
assert test.cache_info() == (0, 0, 10, 0)
assert num_actual_calls_made == 0
# first call
test(1, 2, baz='A')
assert test.cache_info() == (0, 1, 10, 1)
assert num_actual_calls_made == 1
# second call, different params thus second cache entry created
test(3, 4, baz='B')
assert test.cache_info() == (0, 2, 10, 2)
assert num_actual_calls_made == 2
# first call again, cache hit - nice
test(1, 2, baz='A')
assert test.cache_info() == (1, 2, 10, 2)
assert num_actual_calls_made == 2
print("hits %d, misses %d, capacity %d, size %d" % test.cache_info())
# another hit, due to first cache entry
test(1, 2, baz='A')
assert test.cache_info() == (2, 2, 10, 2)
assert num_actual_calls_made == 2
# another hit, due to second cache entry
test(3, 4, baz='B')
assert test.cache_info() == (3, 2, 10, 2)
assert num_actual_calls_made == 2
# hit
test(1, 2, baz='A')
assert test.cache_info() == (4, 2, 10, 2)
assert num_actual_calls_made == 2
print("hits %d, misses %d, capacity %d, size %d" % test.cache_info())
# Check _CacheInfo named tuple fields
assert test.cache_info().hits == 4
assert test.cache_info().misses == 2
assert test.cache_info().maxsize == 10
assert test.cache_info().currsize == 2
# clean up
test.cache_clear()
assert test.cache_info() == (0, 0, 10, 0)
"""Test a second function and
also that setting maxsize to None should set maxsize to 5000"""
@redis_lru(maxsize=None)
def somefunc(foo, bar, baz=None):
global num_actual_calls_made
num_actual_calls_made += 1
print('called somefunc.')
return True
somefunc.init(conn)
somefunc.cache_clear()
num_actual_calls_made = 0
assert somefunc.cache_info() == (0, 0, 5000, 0)
assert num_actual_calls_made == 0
# first call
somefunc(1, 2, baz='A')
assert somefunc.cache_info() == (0, 1, 5000, 1)
assert num_actual_calls_made == 1
# call again, get a hit
somefunc(1, 2, baz='A')
assert somefunc.cache_info() == (1, 1, 5000, 1)
assert num_actual_calls_made == 1
print("hits %d, misses %d, capacity %d, size %d" % somefunc.cache_info())
"""Test initialising using the decorator 'conn' parameter, which means
we don't need to make a separate call to somefunc.init(conn)"""
@redis_lru(maxsize=1, conn=conn)
def anotherfunc(foo, bar, baz=None):
global num_actual_calls_made
num_actual_calls_made += 1
print('called anotherfunc.')
return True
num_actual_calls_made = 0
anotherfunc.cache_clear()
assert anotherfunc.cache_info() == (0, 0, 1, 0)
assert num_actual_calls_made == 0
# first call
anotherfunc(1, 2, baz='A')
assert anotherfunc.cache_info() == (0, 1, 1, 1)
assert num_actual_calls_made == 1
# more calls
anotherfunc(1, 2, baz='A')
anotherfunc(1, 2, baz='A')
assert anotherfunc.cache_info() == (2, 1, 1, 1)
assert num_actual_calls_made == 1
# see if ejection works - only room for one cache entry cos maxsize=1
anotherfunc(1, 2, baz='B') # different params will trigger ejection of old cache entry and miss
assert anotherfunc.cache_info() == (2, 2, 1, 1)
assert num_actual_calls_made == 2
anotherfunc(1, 2, baz='B')
assert anotherfunc.cache_info() == (3, 2, 1, 1)
assert num_actual_calls_made == 2
# this call used to be cached but was ejected, so will get a miss, and an ejection of old
anotherfunc(1, 2, baz='A')
assert anotherfunc.cache_info() == (3, 3, 1, 1)
assert num_actual_calls_made == 3
anotherfunc(1, 2, baz='A')
assert anotherfunc.cache_info() == (4, 3, 1, 1) # hit ok again
assert num_actual_calls_made == 3
"""Test using redis_lru without passing in a redis connection - should raise
an exception. You can avoid this exception by initialising the decorator
with the 'conn' parameter, or by calling func.init(conn) before using the
function You can also set ALLOW_NON_REDIS_CACHING in this module to True to
avoid the exception and allow decorated function to be without knowledge of
a redis connection, in which case they will not get cached at all, and you
will never know. I think raising the exception is better, so that you know you
have made a mistake and not supplied a 'conn' to redis."""
@redis_lru()
def improperfunc(foo, bar, baz=None):
pass
failed = False
try:
improperfunc(1, 2, baz='A')
except RuntimeWarning:
failed = True
assert failed
"""
Test invalidating a specific call signature via cache_clear_entry() -
Python's built in lru cache does not have this method, but we do.
Note, this test configures the decorator with `optimisekwargs=False`
which is not the recommended default. In this mode, each unique call signature
gets it own cache entry, and consequently, needs its own cache_clear_entry call to
granularly clear each such cache entry - a lot to keep track of for the user.
"""
@redis_lru(maxsize=10, conn=conn, optimisekwargs=False)
def smartfunc(foo, bar, baz=None):
return 100
num_actual_calls_made = 0
smartfunc.cache_clear()
# make initial call
smartfunc(1, 2, baz='A')
assert smartfunc.cache_info() == (0, 1, 10, 1)
# granular clear
smartfunc.cache_clear_entry(1, 2, baz='A')
assert smartfunc.cache_info() == (0, 1, 10, 0) # note currsize reset to 0
# this should add back the cached entry for these particular function call params
smartfunc(1, 2, baz='A')
assert smartfunc.cache_info() == (0, 2, 10, 1)
# this should give us a hit again
smartfunc(1, 2, baz='A')
assert smartfunc.cache_info() == (1, 2, 10, 1)
"""Test invalidating a specific call signature via cache_clear_entry,
but the other one should remain cached"""
smartfunc.cache_clear()
smartfunc(1, 2, baz='A') # call type 1
smartfunc(1, 2, baz='B') # call type 2
assert smartfunc.cache_info() == (0, 2, 10, 2)
smartfunc(1, 2, baz='A')
smartfunc(1, 2, baz='B')
assert smartfunc.cache_info() == (2, 2, 10, 2) # two nice hits
smartfunc.cache_clear_entry(1, 2, baz='A') # invalidate one of them
smartfunc(1, 2, baz='B')
assert smartfunc.cache_info() == (3, 2, 10, 1) # still get a hit with the other
"""Test invalidating a specific call signature via cache_clear_entry but this uncovers a
gotcha to be aware of in this granular invalidation concept viz. that because the same
semantics can be achieved with different call signatures so this kind of individual cache
clearing can only be used if signatures match exactly. """
smartfunc.cache_clear()
smartfunc(1, 2, baz='A') # call style #1
assert smartfunc.cache_info() == (0, 1, 10, 1)
# this next call, even though semantically the same, registers as a miss and creates a new entry
smartfunc(1, 2, 'A') # call style #2
assert smartfunc.cache_info() == (0, 2, 10, 2)
smartfunc.cache_clear_entry(1, 2, baz='A') # invalidate one parameter signature - call style #1
assert smartfunc.cache_info() == (0, 2, 10, 1)
# this next call reveals the gotcha edge case to be careful of - a call of style #2 should
# ideally be a miss because we just had an invalidation (albeit signature is different).
# Whilst its semantically invalidated, because the signature is different, it still has its
# own active cached entry which will get a hit and pass back stale/wrong information
smartfunc(1, 2, 'A')
assert smartfunc.cache_info() == (1, 2, 10, 1) # should ideally be a miss (0, 3, 10, 1)
"""Test the correct way to use granular cache_clear_entry which avoids this gotcha
Alternatively define the decorator with the default optimisekwargs=True and avoid needing
to make multiple calls to cache_clear_entry().
"""
smartfunc.cache_clear()
smartfunc(1, 2, baz='A') # call style #1
smartfunc(1, 2, 'A') # call style #2
assert smartfunc.cache_info() == (0, 2, 10, 2)
# clear BOTH styles of call (alternatively of course just call cache_clear() with no parameters)
smartfunc.cache_clear_entry(1, 2, baz='A')
smartfunc.cache_clear_entry(1, 2, 'A')
# test, both will be misses, as expected
smartfunc(1, 2, baz='A') # call style #1
smartfunc(1, 2, 'A') # call style #2
assert smartfunc.cache_info() == (0, 4, 10, 2)
"""Test getting slightly more info out of the cache re a particular function
Extra 'paramsignatures' showing all the active parameter signatures being cached. Invalidating
a particular signature will delete it from this list of tuples."""
info = smartfunc.cache_info(verbose=True)
assert info.hits == 0
assert info.misses == 4
assert info.maxsize == 10
assert info.currsize == 2
assert info.paramsignatures == [(1, 2, 'A'), (1, 2, ('baz', 'A'))] # the new verbose info
assert len(info.paramsignatures) == 2
assert (1, 2, 'A') in info.paramsignatures
assert (1, 2, ('baz', 'A')) in info.paramsignatures
"""Test `optimisekwargs=True` which means less cache memory is taken up
storing multiple semantically equivalent function signatures (call styles
#1-#6) only one entry needed. It also means granular cache clearing is much
safer, and will clear that one cache entry, the user need not track all
possible variants of the ways they called the function and call cache_clear_entry
on each one."""
@redis_lru(maxsize=10, conn=conn, optimisekwargs=True)
def optifunc(foo, bar, baz='A'):
return f"{foo + bar} {baz}"
optifunc.cache_clear()
assert optifunc.cache_info() == (0, 0, 10, 0)
assert optifunc(1, 2, baz='A') == "3 A" # call style #1
assert optifunc.cache_info() == (0, 1, 10, 1)
assert optifunc(1, 2, 'A') == "3 A" # call style #2
# note we got a hit and still there is only one entry in cache, even though this
# call was style #2 not style #1
assert optifunc.cache_info() == (1, 1, 10, 1) # <--- we got a hit
# that's because the parameter signature is normalised into a kwargs, no args
assert optifunc.cache_info(verbose=True).paramsignatures == \
[(('bar', 2), ('baz', 'A'), ('foo', 1))]
# all these variants will hit the same cache entry
assert optifunc(1, 2) == "3 A" # call style #3
assert optifunc(1, bar=2) == "3 A" # call style #4
assert optifunc(foo=1, bar=2) == "3 A" # call style #5
assert optifunc(foo=1, bar=2, baz='A') == "3 A" # call style #6
assert optifunc.cache_info() == (5, 1, 10, 1) # all hits, despite the different call parameters
# Now introduce a semantically different call
assert optifunc(foo=2, bar=2) == "4 A"
assert optifunc.cache_info() == (5, 2, 10, 2) # miss, and a 2nd cache entry
# Delete the first cache entry by passing the params of any one of
# the semantically equivalent calls
optifunc.cache_clear_entry(1, 2)
assert optifunc.cache_info() == (5, 2, 10, 1) # one less cache entry
# Ensure the semantically unrelated call is still unaffected by the granular clear
assert optifunc(foo=2, bar=2) == "4 A"
assert optifunc.cache_info() == (6, 2, 10, 1) # hit increments ok
# Make the original call again and recreate the cache entry
assert optifunc(1, bar=2) == "3 A" # call style #4
assert optifunc.cache_info() == (6, 3, 10, 2) # miss, and 2nd cache entry created again, nice
# Let's have a peek at the two cache entry function signatures
assert optifunc.cache_info(verbose=True).paramsignatures[0] == \
(('bar', 2), ('baz', 'A'), ('foo', 1))
assert optifunc.cache_info(verbose=True).paramsignatures[1] == \
(('bar', 2), ('baz', 'A'), ('foo', 2))
"""Test granular cache clearing of function where default parameters allow calling a function
with no parameters at all."""
@redis_lru(maxsize=10, conn=conn, optimisekwargs=True)
def dargsfunc(foo=1, bar=2, baz='A'): # note all params have defaults
return f"{foo + bar} {baz}"
dargsfunc.cache_clear()
assert dargsfunc.cache_info() == (0, 0, 10, 0)
# first call, no params, all default param values are used
assert dargsfunc() == "3 A" # call style #0 - no parameters passed, default parameters are used
assert dargsfunc.cache_info() == (0, 1, 10, 1)
# second call, use all params - param values are the same as default though,
# so same cache entry will be matched
assert dargsfunc(1, 2, baz='A') == "3 A" # call style #1
assert dargsfunc.cache_info() == (1, 1, 10, 1) # one hit, still one cache entry - cool
# Now introduce a semantically different call
assert dargsfunc(foo=2, bar=2) == "4 A"
assert dargsfunc.cache_info() == (1, 2, 10, 2) # one miss, now two cache entries
# clear the original cache entry by calling with no params (which means default param values
# are used to find the matching signature) or by calling using any of the semantically
# equivalent calls. In this next call, it could have been a call to
# `dargsfunc.cache_clear_entry(1, 2, baz='A')` which is semantically equivalent, or any other
# semantically equivalent variant
dargsfunc.cache_clear_entry() # <--- notice, no parameters
assert dargsfunc.cache_info() == (1, 2, 10, 1) # now down to one cache entry
# this next call to a semantically equivalent clear, has no effect since that cache entry
# already has been cleared
dargsfunc.cache_clear_entry(1, 2, baz='A')
assert dargsfunc.cache_info() == (1, 2, 10, 1) # still down to one cache entry
"""cleanup"""
test.cache_clear()
somefunc.cache_clear()
anotherfunc.cache_clear()
smartfunc.cache_clear()
optifunc.cache_clear()
dargsfunc.cache_clear()
print("tests pass")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment