Last active
June 26, 2023 12:25
-
-
Save gansanay/4514ec731da1a40d8811a2b3c313f836 to your computer and use it in GitHub Desktop.
Compare HDF5 and Feather performance (speed, file size) for storing / reading pandas dataframes
As of 2022, to_feather compresses data by default with lz4. Using hdf5 with blosc:lz4 complevel 5 reaches a similar compression ratio. If you add strings into the mix, the superiority of feather is not that clear with big dataframes, specially in reading times. See modified version at https://github.com/fizban99/hdf_vs_feather/blob/main/hdf_vs_feather.ipynb
Great analysis, thanks for your sharing.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Good analysis, thanks!