Skip to content

Instantly share code, notes, and snippets.

View kleinlennart's full-sized avatar
💭

Lennart Klein kleinlennart

💭
  • United Nations | Executive Office of the Secretary-General (EOSG)
  • New York
  • LinkedIn in/lennart-klein
View GitHub Profile
@kleinlennart
kleinlennart / tweet_count.R
Created November 12, 2020 21:08
Add Total Tweet Count per User to twitter dataset, user aggregation level
add_tweet_count <- function(dat) {
return(
dat %>%
group_by(user_id) %>%
mutate(
tweet_count = n()
) %>%
ungroup() # to remove grouped_df class
)
}
@kleinlennart
kleinlennart / hashtags.R
Created November 12, 2020 20:33
Extract hashtags from tweets
# clean Hashtag extraction method
## evtl. noch Umlaute entfernen!
add_hashtags <- function(dat) {
return(
dat %>%
mutate(
hashtags = text %>% tolower() %>% str_extract_all("#[[:alnum:]_]+")
)
)
@kleinlennart
kleinlennart / unlist_col.R
Created October 25, 2020 15:47
Unlist a column rowwise
dat <- dat %>%
rowwise() %>%
mutate(bundesland = unlist(bundesland),
community = unlist(community),
fach = unlist(fach)) %>%
ungroup() # important to remove rowwise_df class! (slows down all other operations)
@kleinlennart
kleinlennart / mutate.R
Created October 24, 2020 17:43
Dplyr Mutate Function Collection
# Mutate numeric (0 / 1) coded columns to logical
dat <- s1 %>%
mutate(across(where(is.numeric), as.logical))
@kleinlennart
kleinlennart / col_type.R
Created October 21, 2020 11:23
Select columns by data type in dplyr
dat %>% select_if(is.list)
@kleinlennart
kleinlennart / list_index.R
Last active October 21, 2020 14:17
Logically index list of lists in R
# Length of sublist
dat$media_url[sapply(dat$media_url, function(x) length(x) == 2)]
# not empty sublists
dat$media_url[sapply(dat$media_url, function(x) !is.na(x))]
dat$media_url[sapply(dat$media_url, function(x) all(!is.na(x)))] # multiple NA values per sublist
na.omit.list <- function(list) {
return(list[sapply(list, function(x) all(!is.na(x)))])
}
@kleinlennart
kleinlennart / user.R
Created September 30, 2020 21:54
Use User Data Subsets for faster processing in Drake Workflow
drake_plan(
# dplyr::distinct instead?
user_data = tidy_data %>% unique(user_id) %>% select(user_related_cols), # remove "tweet level" vars for faster runtime
geo_data = do_user_data_stuff(),
# left_join: Join matching rows from b to a.
joined_data = target(command = left_join(user_data, tidy_data, by = "user_id"),
format = "fst" # useful ???
@kleinlennart
kleinlennart / _render-pages-plan.R
Last active March 12, 2021 18:25
Drake flavoured `render_site()` page build routine with static branching
# needs to be outside of the drake plan for `rmd_files = !!pages_paths` to work
pages_paths <- dir("reports", pattern = "*.Rmd", full.names = TRUE)
# TODO: exclude _file.Rmd with regex in dir
plan <- drake_plan(
render_pages = target(
# TODO: Add a trigger dependency on _site.yml
command = rmarkdown::render(
knitr_in(rmd_files),
output_dir = file_out("docs/"),
@kleinlennart
kleinlennart / codebook_plan.R
Created September 28, 2020 22:04
Parsing Data Codebook from Excel in R with Drake plan
codebook = file_in(URL)
@kleinlennart
kleinlennart / date.Rmd
Created September 28, 2020 21:46
Best YAML date format in Rmarkdown file
---
date: "`r format(Sys.time(), '%d. %B, %Y')`"
---