Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
Last active December 18, 2020 10:09
Show Gist options
  • Save explodecomputer/599ab29607db0f5a491fb85f555d8921 to your computer and use it in GitHub Desktop.
Save explodecomputer/599ab29607db0f5a491fb85f555d8921 to your computer and use it in GitHub Desktop.
Tidyverse notes

Tidyverse notes

Website: https://www.tidyverse.org/packages/

Comparison of dplyr and base functions: https://cran.r-project.org/web/packages/dplyr/vignettes/base.html

Piping:

library(dplyr)
author <- " Person 1, Person 2, ..."

author %>% 
  as.character %>% 
  stringr::str_trim() %>% 
  gsub("\\.\\.\\.", "et al", .)

vs

gsub("\\.\\.\\.", "et al", stringr::str_trim(as.character(author)))

What is "tidy data"?

Tidy datasets are all alike, but every messy dataset is messy in its own way.

R for Data Science book describes "tidy data" https://r4ds.had.co.nz/tidy-data.html

  • Each variable must have its own column.
  • Each observation must have its own row.
  • Each value must have its own cell.

More in depth discussion in this paper: https://www.jstatsoft.org/article/view/v059i10

Lots of stuff on youtube eg https://www.youtube.com/watch?v=ZM04jn95YP0 which includes this gist of examples: https://gist.github.com/larsentom/727da01476ad1fe5c066a53cc784417b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment