This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
##' Modifies 'data' by adding new values supplied in newDataFileName | |
##' | |
##' newDataFileName is expected to have columns | |
##' c(lookupVariable,lookupValue,newVariable,newValue,source) | |
##' | |
##' Within the column 'newVariable', replace values that | |
##' match 'lookupValue' within column 'lookupVariable' with the value | |
##' newValue'. If 'lookupVariable' is NA, then replace *all* elements | |
##' of 'newVariable' with the value 'newValue'. | |
##' |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(dplyr) | |
# create a dummy dataframe with 100,000 groups and 1,000,000 rows | |
# and partition by group_id | |
df <- data.frame(group_id=sample(1:1e5, 1e6, replace=T), | |
val=sample(1:100, 1e6, replace=T)) %>% | |
group_by(group_id) | |
# filter rows with a value of 1 naively | |
system.time(df %>% filter(val == 1)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Create 2 replicates of 5 "words" generated from random characters, | |
# each "word" 5 - 15 characters long, with word length following a | |
# poisson distribution. | |
rep(replicate(5, paste(sample(letters, round(rpois(5000, lambda = 3)+5, 0), replace = FALSE), collapse = "")), 2) | |
# Sample output: | |
# [1] "rfexnwyjst" "vwtadhjnly" "ztfgvldo" "tmerol" "mcqhosap" "rfexnwyjst" "vwtadhjnly" "ztfgvldo" "tmerol" | |
#[10] "mcqhosap" |