Skip to content

Instantly share code, notes, and snippets.

@pierdom
Created January 5, 2018 09:07
Show Gist options
  • Save pierdom/e0496a294e045cbccb7c22d2e07cddf5 to your computer and use it in GitHub Desktop.
Save pierdom/e0496a294e045cbccb7c22d2e07cddf5 to your computer and use it in GitHub Desktop.
[Pandas DataFrame storage with Apache Parquet] using Apache Arrow (from https://tech.blue-yonder.com/efficient-dataframe-storage-with-apache-parquet/) #python #bigdata #pandas #datascience #parquet
# READING PARQUET FILES TO PANDAS
import pyarrow.parquet as pq
df = pq.read_table('<filename>').to_pandas()
# Only read a subset of the columns
df = pq.read_table('<filename>', columns=['A', 'B']).to_pandas()
# WRITING PARQUET FILES WITH PANDAS
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.Table.from_pandas(data_frame, timestamps_to_ms=True)
pq.write_table(table, '<filename>')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment