Skip to content

Instantly share code, notes, and snippets.

@hannes
Created September 24, 2019 08:18
Show Gist options
  • Save hannes/a09536ead08fb827a8b67cd44999632b to your computer and use it in GitHub Desktop.
Save hannes/a09536ead08fb827a8b67cd44999632b to your computer and use it in GitHub Desktop.
# install like so:
# remotes::install_github("hannesmuehleisen/miniparquet", ref="altrep")
options(tibble.print_max = 10, tibble.print_min = 10)
# parquet file from https://archive.luftdaten.info/parquet/2019-08/sds011/part-00000-54e23417-8f54-4a91-9b6b-b8724706a9a7-c000.snappy.parquet
f <- "pqtest/big_data.snappy.parquet"
system(sprintf("ls -lah %s", f))
# the read_parquet() function only reads metadata and sets up ALTREP
df <- miniparquet::read_parquet(f)
# head() and print.tibble take lazy subsets and are thus fast
system.time(str(head(df)))
system.time(print(tibble::tibble(df)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment