Skip to content

Instantly share code, notes, and snippets.

@cwebber314
Last active December 24, 2015 01:29
Show Gist options
  • Save cwebber314/6724191 to your computer and use it in GitHub Desktop.
Save cwebber314/6724191 to your computer and use it in GitHub Desktop.
Build a pandas hdf store up a little at a time. This is useful for large data sets which do not fit in memory.
"""
References
--------------
[1] http://stackoverflow.com/questions/16997048/how-does-one-append-large-amounts-of-data-to-a-pandas-hdfstore-and-get-a-natural/16999397#16999397
[2] http://stackoverflow.com/questions/19036380/filtering-a-pytables-table-on-pandas-import
[4] http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore
"""
# Build pandas dataframe on disk
import pandas as pd
import numpy as np
store = pd.HDFStore('foo.h5')
for i in range(10):
d = {'branch':['foo','bar'], 'flow':np.random.randn(2)}
df = pd.DataFrame(d)
df.to_hdf('foo.h5', 'table', append=True)
store.close()
store = pd.HDFStore('foo.h5')
df2 = store.select('df', where=['flow > 0.9'])
print df2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment