-
-
Save gansanay/4514ec731da1a40d8811a2b3c313f836 to your computer and use it in GitHub Desktop.
This is great, thanks!
Great explanation to exactly what I needed!
Q: Why is HDF5 so popular when Feather seems to be doing an amazing job? Is there any instance where we are better off choosing HDF5 over Feather?
Would be great if author can extend benchmark using compression for hdf5 format. I'm looking for the best data format to store huge number of data divided on files with ~3000 data rows in each. But since I need to store huge number of such files, I have to trade-off between speed and size. In that sense, compression is important parameter for me and I'm interested to compare compressed hdf5 and feather.
Very clear and concise comparison. Many thanks for doing this!
Good analysis, thanks!
As of 2022, to_feather compresses data by default with lz4. Using hdf5 with blosc:lz4 complevel 5 reaches a similar compression ratio. If you add strings into the mix, the superiority of feather is not that clear with big dataframes, specially in reading times. See modified version at https://github.com/fizban99/hdf_vs_feather/blob/main/hdf_vs_feather.ipynb
Great analysis, thanks for your sharing.
Thank you for such a great example of the methodical approach to the problem.
You pretty much did not leave any space for doubts.
I love it as a Scientist and as an Engineer. Thank you.