This is the fastest way I've found so far to calculate the distances between pairs of lat long coordinates. The method takes advantage of distGeo{geosphere}
and the crazy fast :=
operator of data.table
.
# load libraries
library(data.table)
library(dplyr)
library(sp)
library(rgeos)
library(UScensus2000tract)
library(geosphere)
# load data and create an Origin-Destination matrix
data("oregon.tract")
# get centroids as a data.frame
centroids <- as.data.frame(gCentroid(oregon.tract,byid=TRUE))
# Convert row names into first column
setDT(centroids, keep.rownames = TRUE)[]
# create Origin-destination matrix
orig <- centroids[1:754, ]
dest <- centroids[2:755, ]
odmatrix <- bind_cols(orig,dest)
colnames(odmatrix) <- c("origi_id", "long_orig", "lat_orig", "dest_id", "long_dest", "lat_dest")
Let's make the data set larger to compare speed performance.
# Multiplicate data observations by 1000
odmatrix <- odmatrix[rep(seq_len(nrow(odmatrix)), 1000), ]
###Slow solution
system.time(
odmatrix$dist_km <- sapply(1:nrow(odmatrix),function(i)
spDistsN1(as.matrix(odmatrix[i,2:3]),as.matrix(odmatrix[i,5:6]),longlat=T))
)
> user system elapsed
> 222.17 0.08 222.84
###Fast solution
# convert the data.frame to a data.table
setDT(odmatrix)
system.time(
odmatrix[ , dist_km2 := distGeo(matrix(c(long_orig, lat_orig), ncol = 2),
matrix(c(long_dest, lat_dest), ncol = 2))/1000]
)
> user system elapsed
> 1.76 0.03 1.79
As you can see, the faster solution was more than 120 times faster than the slower alternative in my laptop.
Warning: The results are not identical, but they are roughly the same (a few centimeters-difference) for this specific example. For longer distances, though, the difference might get a bit bigger. This is because spDistsN1{sp}
uses the Euclidean or Great Circle distance between points, while distGeo{geosphere}
calculates distance on an ellipsoid