Skip to content

Instantly share code, notes, and snippets.

@bradmontgomery
Last active October 23, 2024 23:44
Show Gist options
  • Save bradmontgomery/51f44fe0b4fb6c8e04bf86be0d138744 to your computer and use it in GitHub Desktop.
Save bradmontgomery/51f44fe0b4fb6c8e04bf86be0d138744 to your computer and use it in GitHub Desktop.
A cheat-sheet for the command-line tools for HDF5

A cheat sheet of HDF5 command-line tools.

Because I forget these exist, and then I also forget how to use them.

Installation

You may or may not get these by installing the hdf5 libraries.

  • Mac OS: brew install hdf5 or to get the 1.8 version: brew install [email protected]
  • Ubuntu: apt install h5utils

The tools

  • h5ls: Like ls, but for HDF files. Without any options will show you the groups in an HDF5 file.
    • h5ls -r <filename> will recursively show groups and entities
    • h5ls -v <filename> shows verbose details about the objects (h5ls -v <filename>/<groupname>)
    • h5ls <filename>/<groupname> will show data for the specified group.
    • h5ls -d <filename>/<groupname> will show actual values in the group/table/dataset.
  • h5stat: will print stats about a file.
    • h5stat -f <filename> will print file information.
    • htstat -S <filename> will pritn a Summary of statistics.
  • h5check can be used to check for errors in an HDF5 file. May need to be compiled / installed separately.
  • h5repack copies an HDF5 file to a new file with or without compression/chunking.
    • h5repack <old> <new> will repack an old file into a new one.
    • h5repack -i file1 -o file2 -f GZIP=1 -v will apply GZIP compression to all objects in the file.
  • h5perf_serial - Measures HDF5 serial performance (link)
  • h5dump - can be used to show the contents of an HDF file (link)
    • h5dump -H -A 0 <filename> will show the structure of objects (but not attributes) in the file
    • h5dump -d '/<groupname>/table' -H <filename> will show the format of a specified data set in the file.

For more details, see the HDFGroup's list of tools, or visit this newer link to a list of tools?

pytables-specific tools

  • ptdump <filename> allows you to see the sturcture of your HDF.
    • ptdump -v <filename> shows more info about the sturcture
    • ptdump -v -i <filename> includes info about indexes
    • ptdump -d -R 0,5 <filename> should show you data, but may error out (e.g. if indexes are out of order)
  • ptrepack is similar to h5repack and is what our analysis systems uses to periodically repack files.
    • ptrepack --keep-source-filters --chunkshape=auto <old> <new> is how we repack files on the analysis server.
    • NOTE ptrepack <input> <output> will also remove any indexese applied to table columns.
    • NOTE: If we ever see an HDF5ExtError due to some Blosc decompression error, the above repack command may (or may not) help.

See the pytables docs for more info

ptrepack examples

Copy a file, & (ADD/CHANGE) compression of the tables (datasets) with gzip (zlib) with compression level 8, and sort by a specific column.

ptrepack -v -o --complevel 8 --complib zlib --sortby timestamp sample.h5 sample_repacked.h5

Copy a Group from one file and save it in another under a different name:

ptrepack -v -o source.h5:/Example dest.h5:/NewName

h5check

... is a stand-alone format checker. It's not built with any of the other HDF5 tools. Download the source from their old ftp site. You'll need the h5libs installed to compile this, but then you should be able to do:

$ ./configure
$ make

If the above succeeds, you'll have a binary at tool/h5check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment