Skip to content

Instantly share code, notes, and snippets.

@JoFAM
Created November 30, 2017 14:34
Show Gist options
  • Save JoFAM/662df43f55ba1c15b00277064605e509 to your computer and use it in GitHub Desktop.
Save JoFAM/662df43f55ba1c15b00277064605e509 to your computer and use it in GitHub Desktop.
For loops versus outer: You might be surprised by which is faster.
# x is a dataset, and one wants to construct a new dataset created by x minus the 5 "quartiles",
# being 0%, 25%, 50%, 75% and 100% quantiles.
# This constructs the data
x <- rnorm(1e6)
quart <- quantile(x)
# The original code of one of my students. I loathed two things:
# - the use of a for loop when one could use outer()
# - the growing of an object
oorspronkelijk <- function(x, quart){
u_total <- c()
for(i in seq_along(quart)){
u_tmp <- quart[i] - x
u_total <- c(u_total, u_tmp)
}
return(u_total)
}
# First I rewrote the code so that I preallocate the memory. Good practice, ya know.
nogrowth <- function(x, quart){
nx <- length(x)
nq <- length(quart)
u_total <- numeric(nx*nq)
xid <- seq_along(x)
for(i in seq_along(quart)){
tid <- xid + (i-1)*nx
u_total[tid] <- quart[i] - x
}
return(u_total)
}
# Next, I rewrite everything with outer. Trying to be as slick as possible, I hack
# my way into constructing the vector of the original code.
metouter <- function(x, quart){
out <- outer(x,quart,"-")
dim(out) <- NULL
- out
}
all.equal(oorspronkelijk(x, quart),
metouter(x, quart))
## [1] TRUE
all.equal(nogrowth(x, quart),
metouter(x, quart))
## [1] TRUE
# So these methods all give the same result.
# Benchmarking this gives something I really did not expect:
# my student's code was actually the fastest.
library(rbenchmark)
benchmark(oorspronkelijk(x, quart),
metouter(x,quart),
nogrowth(x, quart),
columns = c("test","elapsed","relative"))
## test elapsed relative
## 2 metouter(x, quart) 26.82 2.349
## 3 nogrowth(x, quart) 21.13 1.850
## 1 oorspronkelijk(x, quart) 11.42 1.000
@csgillespie
Copy link

You're not quite comparing like with like. This makes a difference because you are only "growing" the vector 5 times (length(quant)). An optimised version of oorspronkelijk() would be something like:

no_grow <- function(x, quart){
  lx = length(x)
  u_total = numeric(lx* length(quart))
  
  for(i in seq_along(quart)){
    ids = ((i-1)*lx +1):(i*lx)
    u_total[ids] = quart[i] - x
  }
  return(u_total)
}

which is faster than all other function

benchmark(oorspronkelijk(x, quart),
           no_grow(x, quart), replications = 100,
           columns = c("test","elapsed","relative"))
                      test elapsed relative
2        no_grow(x, quart)   4.300    1.000
1 oorspronkelijk(x, quart)   5.204    1.210

The reason the nogrowth() function performs poorly is

tid <- xid + (i-1)*nx

performs large vector operations that seem to take longer than growing the vector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment