Skip to content

Instantly share code, notes, and snippets.

@dwinter
Last active August 29, 2015 14:19
Show Gist options
  • Save dwinter/cdb23d8065c7d86a09e0 to your computer and use it in GitHub Desktop.
Save dwinter/cdb23d8065c7d86a09e0 to your computer and use it in GitHub Desktop.
Biostars riffle examples

From a biostars question about interleaving vectors...

For loops can (but don't have to) be really slow in R, so I wanted to compare answers provided to this question to compare a straightforward for loop approach, to various work arounds to speed the process up.

First, the functions

f_for <- function(a,b){
  res <- c()
  for (i in 1:length(a)){
    res <- c(res, a[i], b[i])
  }
  res
}
  
f_preallocate <- function(a,b) {
  n <- length(a) + length(b)
  res <- character(n)
  res[seq(1,n-1,2)] <- a
  res[seq(2,n,2)] <- b
  res
}

f_rbind <- function(a,b) as.character(rbind(a,b))

f_ord <- function(a,b){
  ord<- order(c(1:length(a), 1:length(b)))
  res<- c(a, b)[ord]
  res
}

fxns <- c("f_for", "f_preallocate", "f_rbind", "f_ord")

Now, running them on the the (small) example data

a <- c("1.TY","2.TY","4.TY","5.TY", "0.TY")
b <- c("1.MN","2.MN","4.MN","5.MN", "0.MN")

for(fname in fxns){
  cat(fname, ":\n\t")
  print(system.time( replicate(10000, do.call(fname, list(a=a, b=b)))))
}
## f_for :
##     user  system elapsed 
##    0.33    0.00    0.33 
## f_preallocate :
##     user  system elapsed 
##    1.64    0.03    1.67 
## f_rbind :
##     user  system elapsed 
##    0.15    0.00    0.16 
## f_ord :
##     user  system elapsed 
##    0.48    0.00    0.48

With this amount of data the for loop doesn't do too badly, and even beats the pre-allocation approach (I guess because of all the futzing to get lengths/indicies)

But how does it scale?

The main cocern with the for-loop approach is how constantly re-allocating memory for the vector slows the process down. So, let's see hwat happens when the to-be-interlaeaved vectors are 4000 elements long:

for(fname in fxns){
  cat(fname, ":\n\t")
  print(system.time( replicate(100, do.call(fname, list(a=rep(a,1000), b=rep(b, 1000))))))
}
## f_for :
##     user  system elapsed 
##   29.58    0.01   29.72 
## f_preallocate :
##     user  system elapsed 
##    0.23    0.00    0.24 
## f_rbind :
##     user  system elapsed 
##    0.11    0.00    0.11 
## f_ord :
##     user  system elapsed 
##    0.19    0.00    0.19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment