Skip to content

Instantly share code, notes, and snippets.

@muschellij2
Created October 12, 2017 05:40
Show Gist options
  • Save muschellij2/ad37cf938820761d17e404571f4bd598 to your computer and use it in GitHub Desktop.
Save muschellij2/ad37cf938820761d17e404571f4bd598 to your computer and use it in GitHub Desktop.
Fuzzy Matching to data sets based on a Date
library(lubridate)
rdate <- function(size,
min = paste0(format(Sys.Date(), '%Y'), '-01-01'),
max = paste0(format(Sys.Date(), '%Y'), '-12-31'),
sort = TRUE) {
dates <- sample(
seq(as.Date(min), as.Date(max), by = "day"),
size = size, replace = TRUE)
if (sort == TRUE) {
sort(dates)
} else {
dates
}
}
id_vec = sample(letters[1:5], 100, replace = TRUE)
df = data.frame( date = rdate(100), id = id_vec)
df_2 = data.frame( date2 = rdate(100), id = id_vec, ind = 1:100)
library(dplyr)
# join all combos
oj = full_join(df, df_2, by = "id")
# just resort the data
oj = oj %>%
arrange(id, date, date2)
# get absolute difference in date
oj = oj %>%
mutate(date_diff = abs(date - date2))
# for each id/date pair, give the difference and the index for that
# difference
oj_print = oj %>%
group_by(id, date) %>%
mutate( min_diff = min(date_diff),
index = ind[which.min(date_diff)])
# actually keep the closest
oj = oj %>%
group_by(id, date) %>%
slice(which.min(date_diff))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment