Suppose you have a long running calculation:
f <- function(x) {
message("Evaluating slow function")
Sys.sleep(5) # sleep 5 seconds to simulate long running time
x
}Which is used like so:
f(10)However, you only want to rerun f() sometimes (say when an upsteam data source changes). I usually do something like this:
run.cached <- function(expr, filename, regenerate=FALSE) {
if ( file.exists(filename) && !regenerate ) {
res <- readRDS(filename)
} else {
res <- eval.parent(substitute(expr))
saveRDS(res, file=filename)
}
res
}This is a simple caching function; tries to load the .rds file
indicated by filename if it exists, otherwise it runs the expression
in expr and saves the output in the file filename. If you specify
regenerate=TRUE it will rerun the expression
Simple caching; run 'expr' and save the output in 'filename'; if 'filename' already exists just load that. If regenerate is TRUE, it always runs the expression.
So you can do this:
run.cached(f(5), 'mycache.rds') # runs the slow function
run.cached(f(5), 'mycache.rds') # won't run, returns cached result
run.cached(f(10), 'mycache.rds', TRUE) # runs the slow functionWhen I want to make sure everything works correctly for the final
published version, I delete the .rds files, which forces everything
to be recalculated.
There are a variety of packages on CRAN that do this already, apparently: R.cache, SOAR, and (for Sweave) cacheSweave. These may be more robust!
knitr has a caching option as well, which has worked well for me so far. It seems to do some pretty clever wizardry to tell if any recalculating is needed.