Skip to content

Instantly share code, notes, and snippets.

@akorn
Last active September 3, 2024 22:15
Show Gist options
  • Save akorn/51ee2fe7d36fa139723c851d87e56096 to your computer and use it in GitHub Desktop.
Save akorn/51ee2fe7d36fa139723c851d87e56096 to your computer and use it in GitHub Desktop.
Run specified (presumably expensive) command with specified arguments and cache result. If cache is fresh enough, don't run command again but return cached output.
#!/bin/zsh
#
# Purpose: run speficied command with specified arguments and cache result. If cache is fresh enough, don't run command again but return cached output.
# Also cache exit status and stderr.
# Use silly long variable names to avoid clashing with whatever the invoked program might use
RUNCACHED_MAX_AGE=300
RUNCACHED_CACHE_DIR=/var/cache/runcached
RUNCACHED_IGNORE_ENV=0
RUNCACHED_IGNORE_PWD=0
[[ -n "$HOME" ]] && RUNCACHED_CACHE_DIR=$HOME/.runcached
function usage() {
echo "Usage: runcached [--ttl <max cache age>] [--cache-dir <cache directory>]"
echo " [--ignore-env] [--ignore-pwd] [--help] [--prune-cache]"
echo " [--] command [arg1 [arg2 ...]]"
echo
echo
echo "Run 'command' with the specified args and cache stdout, stderr and exit"
echo "status. If you run the same command again and the cache is fresh, cached"
echo "data is returned and the command is not actually run. Side effects of"
echo "'command' are not cached."
echo
echo "Normally, all environment variables as well as the current working directory"
echo "are included in the cache key. The --ignore options disable this. The OLDPWD"
echo "variable is always ignored."
echo
echo "--prune-cache deletes all cache entries older than the maximum age. There is"
echo "no other mechanism to prevent the cache growing without bounds."
echo
echo "Note that there is no cache invalidation logic except cache age"
echo "(specified in seconds)."
echo
echo "The default cache directory is ${RUNCACHED_CACHE_DIR}."
echo "Maximum cache age defaults to ${RUNCACHED_MAX_AGE}."
echo
echo "If the cache can't be created, the command is run uncached."
echo "This script is always silent; any output comes from the invoked command."
exit 0
}
while [[ -n "$1" ]]; do
case "$1" in
--ttl) RUNCACHED_MAX_AGE="$2"; shift 2;;
--cache-dir) RUNCACHED_CACHE_DIR="$2"; shift 2;;
--ignore-env) RUNCACHED_IGNORE_ENV=1; shift;;
--ignore-pwd) RUNCACHED_IGNORE_PWD=1; shift;;
--prune-cache) RUNCACHED_PRUNE=1; shift;;
--help) usage;;
--) shift; break;;
*) break;;
esac
done
zmodload zsh/datetime
zmodload zsh/stat
[[ -d "$RUNCACHED_CACHE_DIR" ]] || mkdir -p "$RUNCACHED_CACHE_DIR" >/dev/null 2>/dev/null
((RUNCACHED_PRUNE)) && find "$RUNCACHED_CACHE_DIR/." -maxdepth 1 -type f \! -newermt @$[EPOCHSECONDS-RUNCACHED_MAX_AGE] -delete 2>/dev/null
(
unset OLDPWD # Almost(?) nothing uses OLDPWD, but taking it into account potentially reduces cache efficency.
((RUNCACHED_IGNORE_PWD)) && unset PWD
((RUNCACHED_IGNORE_ENV)) || env
echo -E "$@"
) | md5sum | read RUNCACHED_CACHE_KEY RUNCACHED__crap__
{{RUNCACHED_LOCK_FD}}>>$RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.lock
# If we can't obtain a lock, we want to run uncached; otherwise
# 'runcached' wouldn't be transparent because it would prevent
# parallel execution of several instances of the same command.
# Locking is necessary to avoid races between the mv(1) command
# below replacing stderr with a newer version and another instance
# of runcached using a newer stdout with the older stderr.
if flock -n $RUNCACHED_LOCK_FD 2>/dev/null; then
if [[ -f $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.stdout ]]; then
if [[ $[EPOCHSECONDS-$(zstat +mtime $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.stdout)] -le $RUNCACHED_MAX_AGE ]]; then
cat $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.stdout &
cat $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.stderr >&2 &
wait
exit $(<$RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.exitstatus)
else
rm -f $RUNCACHED_CACHE_DIR/$RUNCACHED_CACHE_KEY.{stdout,stderr,exitstatus} 2>/dev/null
fi
fi
# only reached if cache didn't exist or was too old
if [[ -d $RUNCACHED_CACHE_DIR/. ]]; then
RUNCACHED_tempdir=$(mktemp -d 2>/dev/null)
if [[ -d $RUNCACHED_tempdir/. ]]; then
$@ >&1 >$RUNCACHED_tempdir/$RUNCACHED_CACHE_KEY.stdout 2>&2 2>$RUNCACHED_tempdir/$RUNCACHED_CACHE_KEY.stderr
RUNCACHED_ret=$?
echo $RUNCACHED_ret >$RUNCACHED_tempdir/$RUNCACHED_CACHE_KEY.exitstatus 2>/dev/null
mv $RUNCACHED_tempdir/$RUNCACHED_CACHE_KEY.{stdout,stderr,exitstatus} $RUNCACHED_CACHE_DIR/ 2>/dev/null
rmdir $RUNCACHED_tempdir 2>/dev/null
exit $RUNCACHED_ret
fi
fi
fi
# only reached if cache not created successfully or lock couldn't be obtained
exec $@
@jpbochi
Copy link

jpbochi commented Aug 2, 2019

I looked for something like this more than once. Yours is lovely. If you don't mind, I'll try to adapt it to regular bash.

@akorn
Copy link
Author

akorn commented Aug 2, 2019

Sure, go ahead and good luck; it's GPLv3 for a reason.

I don't think bash has such an elegant locking solution, though.

Maybe post a pointer to your bash version here when you're done?

Also, I'm curious what your use case is; care to share?

@jpbochi
Copy link

jpbochi commented Aug 2, 2019

I'm want to have a custom status bar for iTerm2 that displays info about my current external IP. In order to not flood the IP api host, I'd like to cache the results for some time.

I'll post a link for the bash version, for sure.

One issue I'm facing is that the whole thing is a bit slow. Between md5 and checking the age of the cache files, it's adding some 200+ ms.

@jpbochi
Copy link

jpbochi commented Aug 2, 2019

here it is: https://gist.github.com/jpbochi/c7971a0f3d7a3b9fdb71c5621c8e41c5

I had to get rid of the lock feature. I'm not sure how I'd do something equivalent to what you did.

@akorn
Copy link
Author

akorn commented Aug 2, 2019

One issue I'm facing is that the whole thing is a bit slow. Between md5 and checking the age of the cache files, it's adding some 200+ ms.

I don't think that can be helped -- we must generate a cache key somehow. I can't think of a faster way than md5sum.

Maybe instead of running the actual command that gets your external IP from iTerm2, have it read a file, and update the file from a cronjob? You don't even need runcached then.

I had to get rid of the lock feature. I'm not sure how I'd do something equivalent to what you did.

You can approximate it using flock(1), but it'll be slower. See https://gist.github.com/akorn/51ee2fe7d36fa139723c851d87e56096/0219183b886a3268fd3b707cd15b335b70567eb1 for an older version that used flock(1).

You really need the locking to avoid race conditions when several instances of runcached are executing.

@jpbochi
Copy link

jpbochi commented Aug 5, 2019

You really need the locking to avoid race conditions when several instances of runcached are executing.

In some cases, like mine, it's okay if two instances are running at the same time. As long as some value gets cached for following calls, it would be fine.

Maybe instead of running the actual command that gets your external IP from iTerm2, have it read a file, and update the file from a cronjob? You don't even need runcached then.

Yeah, I ended up using a much shorter version of this script in my case.

@dimo414
Copy link

dimo414 commented Apr 23, 2020

Here's a Bash caching library I implemented a while back: https://github.com/dimo414/bash-cache

Edit: and more recently I've also published https://github.com/dimo414/bkt which is a standalone caching binary written in Rust, so it works anywhere not just in Bash.

@varenc
Copy link

varenc commented Oct 28, 2021

Cleaning the cache with --prune-cache seems to be broken I think.

There's this line in the script:
((RUNCACHED_PRUNE)) && find "$RUNCACHED_CACHE_DIR/." -maxdepth 1 -type f \! -newermt @$[EPOCHSECONDS-RUNCACHED_MAX_AGE] -delete 2>/dev/null

Which looks for files (-type f) in the top level (-maxdepth 1) of $RUNCACHED_CACHE_DIR. However, runcached now uses a hierarchy of directories derived from the cache so there's never any files in this directory. For example: .runcached/c0/e9/f5fae4737f77e0ae724dfb4ac1d0.stdout. I assume an older version didn't use a hierarchy of hashed directory names and this used to work then?

I think a simple fix would be to avoid the depth limit like this:
((RUNCACHED_PRUNE)) && find "$RUNCACHED_CACHE_DIR/." -type f \! -newermt @$[EPOCHSECONDS-RUNCACHED_MAX_AGE] -delete 2>/dev/null

But I think that's only correct if all the .stdout/.stderr files will have the same creation date.

(also if you're on macOS, the built-in BSD version of find doesn't support newermt. But I just changed to using the GNU distribution installed as gfind)

@akorn
Copy link
Author

akorn commented Oct 29, 2021

Cleaning the cache with --prune-cache seems to be broken I think.

You're right; thanks, nice catch.

There's this line in the script: ((RUNCACHED_PRUNE)) && find "$RUNCACHED_CACHE_DIR/." -maxdepth 1 -type f \! -newermt @$[EPOCHSECONDS-RUNCACHED_MAX_AGE] -delete 2>/dev/null

Which looks for files (-type f) in the top level (-maxdepth 1) of $RUNCACHED_CACHE_DIR. However, runcached now uses a hierarchy of directories derived from the cache so there's never any files in this directory. For example: .runcached/c0/e9/f5fae4737f77e0ae724dfb4ac1d0.stdout. I assume an older version didn't use a hierarchy of hashed directory names and this used to work then?

That's exactly what happened, yes.

I think a simple fix would be to avoid the depth limit like this: ((RUNCACHED_PRUNE)) && find "$RUNCACHED_CACHE_DIR/." -type f \! -newermt @$[EPOCHSECONDS-RUNCACHED_MAX_AGE] -delete 2>/dev/null

But I think that's only correct if all the .stdout/.stderr files will have the same creation date.

The stdout and stderr files should have the same creation date as near as possible (although there are no guarantees), but the exitstatus file can be much newer.

On the surface, this seems racy: if we prune the cache, and the exitstatus file is new enough to be left around, then a subsequent instance might pick it up and use it as a cached exit status. This can't actually happen due to the locking, though: only one instance can use the same cache entry at the same time, and if the stdout file doesn't exist in line 101, it regenerates the exitstatus file too. There is no place where a stray exitstatus file could end up being used.

However, I just realized that parallel execution will not benefit from caching even if the cache is fresh, because we lock the pertinent cache entry even if we just read from it. I'll need to give this some thought when I have more time. Maybe have a separate read lock and write lock.

Anyway, I believe your fix is correct.

@akorn
Copy link
Author

akorn commented Feb 25, 2022

@varenc: I just uploaded a version that I believe fixes these problems (pruning and parallelism). Take a look if you're still interested.

@varenc
Copy link

varenc commented Nov 1, 2022

Just want to say I still use and rely on this daily. It's always been rock solid. Thank you @akorn!

I use it so much I made my own fork of this that defines runcachedFunc as a shell function in one of my rc files. I've found that this is very slightly faster because it doesn't have to start a new zsh process to run. But mainly I did this so I can use runcache to cache the output of other shell functions.

I also added support for VERBOSE=1, and when it encounters that env it outputs logging information to stderr which lets me know if the function is returning from cache, if a cache exists but is expired, or if no cache exists at all. It's been helpful for debugging. To make this work nicely I also had to remove VERBOSE from the env used for calculating the cache key like you do for OLDPWD. Unfortunately these verbose logging get included in the cache though which is pretty janky and confusing.

I also added support for sub-second cache TTLs just by using zstat +mtime -F "%s.%N". Though that's only supported it recent-ish zsh versions.

If there's others out there that would find any of these features useful let me know and I can try to clean this up and share it. But for rock solid quality I can't vouch enough for this as it already is. Cheers!

@akorn
Copy link
Author

akorn commented Nov 1, 2022

Just want to say I still use and rely on this daily. It's always been rock solid. Thank you @akorn!

@varenc, you're welcome; I'm glad you found it useful. :) (Funny; I hardly ever use it myself...)

I'm sure there are people who would be interested in your fork.

@devnore
Copy link

devnore commented Apr 8, 2024

@varenc Could you by any chance share the runcachedFunc you mentioned? I'm thinking of integrating it in my dotfiles and would love it if I don't have to reinvent the wheel :)

@akorn
Copy link
Author

akorn commented Apr 8, 2024

@varenc Could you by any chance share the runcachedFunc you mentioned? I'm thinking of integrating it in my dotfiles and would love it if I don't have to reinvent the wheel :)

I don't immediately see a reason why just putting function runcachedFunc() { at the top and } at the bottom shouldn't work, possibly after minor fiddling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment