Last active
January 1, 2016 05:09
-
-
Save Malarkey73/8096475 to your computer and use it in GitHub Desktop.
A slightly different comparison of R and Julia vectorisation closer to what most R people mean when they say "vectorised". The code example is plucked from the Julia blog discussion on code optimisation. Surprisingly for this use case Julia devectorised (~=0.18 secs) is just a little better than Julia vectorised (~= 0.25 secs), which is a little…
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is sort of a reply to this post: | |
http://www.johnmyleswhite.com/notebook/2013/12/22/the-relationship-between-vectorized-and-devectorized-code/ | |
which arises from a a discussion of this | |
http://julialang.org/blog/2013/09/fast-numeric/ | |
So.... | |
## This is Julia code | |
# executable code here: https://gist.github.com/Malarkey73/8097545 | |
# vectorised function from Julia blog | |
function vectorised(x,y) | |
r = exp(-abs(x-y)) | |
return r | |
end | |
# recommended devectorised function | |
function devectorised(x,y) | |
r = similar(x) | |
for i = 1:length(x) | |
r[i] = exp(-abs(x[i]-y[i])) | |
end | |
return r | |
end | |
# timer func for vectorised (adapted from john Myles White original code) | |
function timevec(N, x,y) | |
timings = Array(Float64, N) | |
# Force compilation | |
vectorised(x,y) | |
for itr in 1:N | |
timings[itr] = @elapsed vectorised(x,y) | |
end | |
return timings | |
end | |
# timer func for devectorised (adapted from john Myles White original code) | |
function timedevec(N, x,y) | |
timings = Array(Float64, N) | |
# Force compilation | |
devectorised(x,y) | |
for itr in 1:N | |
timings[itr] = @elapsed devectorised(x,y) | |
end | |
return timings | |
end | |
# data | |
julia> x= [1: 2e6] | |
julia> y= x * 2 | |
# tldr RESULTS !!!!!!!!!!!!!!!!!! | |
julia> median(timevec(50,x,y)) | |
0.24664389599999997 | |
julia> median(timedevec(50,x,y)) | |
0.176243553 | |
------------------------------------------------------ | |
#This is now R | |
# executable code with added Rcpp here: https://gist.github.com/hadley/8097300 | |
# vectorised function | |
vectorised <- function(x, y) | |
{ | |
r = exp(-abs(x-y)) | |
return(r) | |
} | |
#devectorised function | |
devectorised <- function(x,y) | |
{ | |
r=rep(NA, length(x)) | |
for(i in seq_along(x)) | |
{ | |
r[i]= exp(-abs(x[i]-y[i])) | |
} | |
return(r) | |
} | |
# data | |
> x= 1:2e6 | |
> y= x * 2 | |
# tldr RESULTS !!!!!!!!!!!!!!!!!! | |
> microbenchmark( | |
vectorised(x,y), | |
devectorised(x,y), | |
unit = "s", times=5) | |
Unit: seconds | |
expr min lq median uq max neval | |
vectorised(x, y) 0.2058464 0.2165744 0.2610062 0.2612965 0.2805144 5 | |
devectorised(x, y) 9.7923054 9.8095265 9.8097871 9.8606076 10.0144012 5 | |
I chose this function because the original was a loop 1e6 * 2-fold vector addition. So this is just a 2e6 iteration. It's the sort of code you see in every day data munging, or matrix algebra. I want to learn more Julia so i may try benchmarking Data.frames logical indexing and and apply type operations because these are the true data analysis workhorses.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Interesting example. This is a very close analogue to the second set of snippets in my post: almost all of the time is spent in memory allocation for
r
, so there's not much to be gained from devectorization.