Created
January 5, 2018 09:07
-
-
Save pierdom/e0496a294e045cbccb7c22d2e07cddf5 to your computer and use it in GitHub Desktop.
[Pandas DataFrame storage with Apache Parquet] using Apache Arrow (from https://tech.blue-yonder.com/efficient-dataframe-storage-with-apache-parquet/) #python #bigdata #pandas #datascience #parquet
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # READING PARQUET FILES TO PANDAS | |
| import pyarrow.parquet as pq | |
| df = pq.read_table('<filename>').to_pandas() | |
| # Only read a subset of the columns | |
| df = pq.read_table('<filename>', columns=['A', 'B']).to_pandas() | |
| # WRITING PARQUET FILES WITH PANDAS | |
| import pyarrow as pa | |
| import pyarrow.parquet as pq | |
| table = pa.Table.from_pandas(data_frame, timestamps_to_ms=True) | |
| pq.write_table(table, '<filename>') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment