Created
November 30, 2017 14:34
-
-
Save JoFAM/662df43f55ba1c15b00277064605e509 to your computer and use it in GitHub Desktop.
For loops versus outer: You might be surprised by which is faster.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # x is a dataset, and one wants to construct a new dataset created by x minus the 5 "quartiles", | |
| # being 0%, 25%, 50%, 75% and 100% quantiles. | |
| # This constructs the data | |
| x <- rnorm(1e6) | |
| quart <- quantile(x) | |
| # The original code of one of my students. I loathed two things: | |
| # - the use of a for loop when one could use outer() | |
| # - the growing of an object | |
| oorspronkelijk <- function(x, quart){ | |
| u_total <- c() | |
| for(i in seq_along(quart)){ | |
| u_tmp <- quart[i] - x | |
| u_total <- c(u_total, u_tmp) | |
| } | |
| return(u_total) | |
| } | |
| # First I rewrote the code so that I preallocate the memory. Good practice, ya know. | |
| nogrowth <- function(x, quart){ | |
| nx <- length(x) | |
| nq <- length(quart) | |
| u_total <- numeric(nx*nq) | |
| xid <- seq_along(x) | |
| for(i in seq_along(quart)){ | |
| tid <- xid + (i-1)*nx | |
| u_total[tid] <- quart[i] - x | |
| } | |
| return(u_total) | |
| } | |
| # Next, I rewrite everything with outer. Trying to be as slick as possible, I hack | |
| # my way into constructing the vector of the original code. | |
| metouter <- function(x, quart){ | |
| out <- outer(x,quart,"-") | |
| dim(out) <- NULL | |
| - out | |
| } | |
| all.equal(oorspronkelijk(x, quart), | |
| metouter(x, quart)) | |
| ## [1] TRUE | |
| all.equal(nogrowth(x, quart), | |
| metouter(x, quart)) | |
| ## [1] TRUE | |
| # So these methods all give the same result. | |
| # Benchmarking this gives something I really did not expect: | |
| # my student's code was actually the fastest. | |
| library(rbenchmark) | |
| benchmark(oorspronkelijk(x, quart), | |
| metouter(x,quart), | |
| nogrowth(x, quart), | |
| columns = c("test","elapsed","relative")) | |
| ## test elapsed relative | |
| ## 2 metouter(x, quart) 26.82 2.349 | |
| ## 3 nogrowth(x, quart) 21.13 1.850 | |
| ## 1 oorspronkelijk(x, quart) 11.42 1.000 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
You're not quite comparing like with like. This makes a difference because you are only "growing" the vector 5 times (length(quant)). An optimised version of
oorspronkelijk()would be something like:which is faster than all other function
The reason the
nogrowth()function performs poorly isperforms large vector operations that seem to take longer than growing the vector.