Skip to content

Instantly share code, notes, and snippets.

@brendancol
Created November 10, 2016 17:17
Show Gist options
  • Save brendancol/ab7475503be42f01dfb62e9c248a6e85 to your computer and use it in GitHub Desktop.
Save brendancol/ab7475503be42f01dfb62e9c248a6e85 to your computer and use it in GitHub Desktop.
HDF5 Census -> Parquet
# Install fastparquet and pytables
# conda install pytables
# conda install -c conda-forge fastparquet
# conda install python-snappy
import pandas as pd
import fastparquet as fp
# Write file
df = pd.read_hdf(data_path, base)
fp.write('census.parq', df, partitions=[0, 100000000, 200000000])
fp.write('census.gz.parq', df, partitions=[0, 100000000, 200000000], compression='gzip')
fp.write('census.snappy.parq', df, partitions=[0, 100000000, 200000000], compression='snappy')
# Read file
df2 = fp.ParquetFile('census.parq').to_pandas()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment