Skip to content

Instantly share code, notes, and snippets.

@dantonnoriega
Last active August 1, 2019 18:43
Show Gist options
  • Save dantonnoriega/cb1c24e688e13df05805f30c2bf40f6d to your computer and use it in GitHub Desktop.
Save dantonnoriega/cb1c24e688e13df05805f30c2bf40f6d to your computer and use it in GitHub Desktop.
reprex of very slow named vector search when searching with NA
# NA key names using character vector
k1 <- c('a', 'b', NA_character_)
names(k1) <- k1
v1 <- sample(k1, 1e5, replace = TRUE, prob = c(.2, .2, .6))
# search with "null", but return NA value
k2 <- k1
names(k2) <- c('a', 'b', 'null') # new key name
# use "null" instead of NA;
v2 <- v1
v2[is.na(v2)] <- "null"
# same value in vector (after removing names)
identical(unname(k1[v1]), unname(k2[v2]))
# try with numbers
k3 <- c(1, 2, NA_real_)
names(k3) <- k3
v3 <- sample(k3, 1e5, replace = TRUE, prob = c(.2, .2, .6))
# try with factors
k4 <- as.factor(c('a', 'b', NA_character_))
names(k4) <- k4
v4 <- sample(k4, 1e5, replace = TRUE, prob = c(.2, .2, .6))
# but speed diff is enormous
kv_search <- function(k, v) k[v]
microbenchmark::microbenchmark(
kv_search(k1,v1), # slow
kv_search(k2,v2), # fast
kv_search(k3,v3), # fast
kv_search(k4,v4), # fast
times = 5, unit = 's')
@dantonnoriega
Copy link
Author

According to hadley:

There’s an internal heuristic used to decided whether to build up a hashmap/dictionary when doing character subsetting. I’d guess this is related.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment