Skip to content

Instantly share code, notes, and snippets.

@Malarkey73
Last active January 1, 2016 05:09
Show Gist options
  • Save Malarkey73/8096475 to your computer and use it in GitHub Desktop.
Save Malarkey73/8096475 to your computer and use it in GitHub Desktop.
A slightly different comparison of R and Julia vectorisation closer to what most R people mean when they say "vectorised". The code example is plucked from the Julia blog discussion on code optimisation. Surprisingly for this use case Julia devectorised (~=0.18 secs) is just a little better than Julia vectorised (~= 0.25 secs), which is a little…
This is sort of a reply to this post:
http://www.johnmyleswhite.com/notebook/2013/12/22/the-relationship-between-vectorized-and-devectorized-code/
which arises from a a discussion of this
http://julialang.org/blog/2013/09/fast-numeric/
So....
## This is Julia code
# executable code here: https://gist.github.com/Malarkey73/8097545
# vectorised function from Julia blog
function vectorised(x,y)
r = exp(-abs(x-y))
return r
end
# recommended devectorised function
function devectorised(x,y)
r = similar(x)
for i = 1:length(x)
r[i] = exp(-abs(x[i]-y[i]))
end
return r
end
# timer func for vectorised (adapted from john Myles White original code)
function timevec(N, x,y)
timings = Array(Float64, N)
# Force compilation
vectorised(x,y)
for itr in 1:N
timings[itr] = @elapsed vectorised(x,y)
end
return timings
end
# timer func for devectorised (adapted from john Myles White original code)
function timedevec(N, x,y)
timings = Array(Float64, N)
# Force compilation
devectorised(x,y)
for itr in 1:N
timings[itr] = @elapsed devectorised(x,y)
end
return timings
end
# data
julia> x= [1: 2e6]
julia> y= x * 2
# tldr RESULTS !!!!!!!!!!!!!!!!!!
julia> median(timevec(50,x,y))
0.24664389599999997
julia> median(timedevec(50,x,y))
0.176243553
------------------------------------------------------
#This is now R
# executable code with added Rcpp here: https://gist.github.com/hadley/8097300
# vectorised function
vectorised <- function(x, y)
{
r = exp(-abs(x-y))
return(r)
}
#devectorised function
devectorised <- function(x,y)
{
r=rep(NA, length(x))
for(i in seq_along(x))
{
r[i]= exp(-abs(x[i]-y[i]))
}
return(r)
}
# data
> x= 1:2e6
> y= x * 2
# tldr RESULTS !!!!!!!!!!!!!!!!!!
> microbenchmark(
vectorised(x,y),
devectorised(x,y),
unit = "s", times=5)
Unit: seconds
expr min lq median uq max neval
vectorised(x, y) 0.2058464 0.2165744 0.2610062 0.2612965 0.2805144 5
devectorised(x, y) 9.7923054 9.8095265 9.8097871 9.8606076 10.0144012 5
@Malarkey73
Copy link
Author

@HarlanH Sorry there was actually another typo above, devectorised Julia is fractionally faster both for the wrong way I did it and the right way you have pointed out (for this use case).

NB both ways give me the same answer.

@johnmyleswhite
Copy link

Interesting example. This is a very close analogue to the second set of snippets in my post: almost all of the time is spent in memory allocation for r, so there's not much to be gained from devectorization.

@Malarkey73
Copy link
Author

I chose this function because the original was a loop 1e6 * 2-fold vector addition. So this is just a 2e6 iteration. It's the sort of code you see in every day data munging, or matrix algebra. I want to learn more Julia so i may try benchmarking Data.frames logical indexing and and apply type operations because these are the true data analysis workhorses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment