Skip to content

Instantly share code, notes, and snippets.

@burchill
Last active May 18, 2018 16:31
Show Gist options
  • Save burchill/1b55892d4cfca794761aeb5cf802beae to your computer and use it in GitHub Desktop.
Save burchill/1b55892d4cfca794761aeb5cf802beae to your computer and use it in GitHub Desktop.
Typed NAs and dplyr's `case_when()`
# Dplyr's `case_when` function is great for doing nested `ifelse()` functions:
# it requires less keystrokes and is somewhat faster as well.
# The only "problem" is that it requires "strict typing," which can pose a problem because of NAs.
# Let's see how this works
# A data frame
df <- data.frame(
x = runif(100),
y = runif(100)+0.2 # So we have a few values > 1
)
# Let's try `case_when()`
df$x_factor <- factor(dplyr::case_when(df$x < 0.33 ~ "one",
df$x >= 0.33 & df$x < 0.66 ~ "two",
TRUE ~ "three")) # The default value
# Won't work because `case_when` needs all values to be the same type!
df$x_char <- dplyr::case_when(df$x < 0.33 ~ "uno",
df$x >= 0.33 & df$x < 0.66 ~ "dos",
TRUE ~ NA) # This value isn't a character, it's an NA, which by default is of type "logical"!
# See:
is.character(NA)
is.logical(NA)
# However, there are TYPED NAs in R!
# `NA_character_`, `NA_integer_`, `NA_real_`, `NA_complex_`
# See:
is.character(NA_character_)
is.numeric(NA_integer_)
is.integer(NA_integer_)
is.numeric(NA_real_)
is.integer(NA_real_)
# Therefore:
df %>%
# `case_when` can also be used in `tidyverse` code, e.g., `mutate`, etc.
mutate(y_char = dplyr::case_when(y < 0.33 ~ "one", # Arguments will be evaluated in order
y < 0.66 ~ "two", # so in this case we don't need the >'s.
y < 1 ~ "three", # `case_when` needs all the elements to
TRUE ~ NA_character_)) # be the same type, so normal NAs won't do
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment