Skip to content

Instantly share code, notes, and snippets.

@geotheory
Created January 11, 2021 18:51
Show Gist options
  • Save geotheory/a72c86a481a89cf132bae52af766c0e9 to your computer and use it in GitHub Desktop.
Save geotheory/a72c86a481a89cf132bae52af766c0e9 to your computer and use it in GitHub Desktop.
# dummy dataset
require(gutenbergr)
d = ggplot2::diamonds
d$txt = gutenberg_download(1184)$text[1:nrow(d)]
filename = "~/Downloads/dat.csv"
readr::write_csv(d, filename)
#-------------------
require(data.table)
chunksize = 1000
rowcount = as.numeric(str_extract(system('wc -l ~/Downloads/dat.csv', intern = TRUE), '[0-9]+'))
field_names = strsplit(readLines(filename, n = 1), ',')[[1]]
d = lapply(seq_len(ceiling(rowcount / chunksize)), function(i){
x = fread(filename, skip = (i-1) * chunksize, nrows = chunksize, col.names = field_names)
x[stringr::str_detect(txt, 'Count[^a-z]')]
})
bind_rows(d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment