Quick HDF5 benchmark

Xarthisius commented Jan 7, 2016

Please also analyse this:

import numpy as np
from tempfile import mkdtemp
import os.path as path
filename = path.join(mkdtemp(), 'newfile.dat')
n = 100000
arr = np.random.rand(n, 1000)
fp = np.memmap(filename, dtype='float32', mode='w+', shape=(n, 1000))
fp[:] = arr[:]
del fp

then:

%%timeit -n1 -r1
newfp = np.memmap(filename, dtype='float32', mode='r', shape=(n, 1000))
print newfp[:2, :3]
del newfp

and

%%timeit -n1 -r1 
foo = np.fromfile(filename, dtype="float32").reshape(n, 1000)
print foo[:2, :3]

no HDF5 involved...

Xarthisius commented Jan 7, 2016

I messed up my first comment, sorry for that. I'm pasting it again for reference:
I'm not entirely sure you're aware what happens during each call. I'd suggest trying this:

%%timeit -n1 -r1
f = h5py.File('test.h5', 'r')
print(f['/test'][:2, :3])
f.close()

Author

rossant commented Jan 7, 2016

Benchmark updated following comment by Stuart Berg.

andrewcollette commented Jan 7, 2016

@rossant, a big part of this is that the fancy-indexing code in h5py uses a naive algorithm based on repeated hyperslab selection, which is quadratic in the number of indices. It was designed/tested for small numbers of indices.

The particular example you have here (0 to 10000 in steps of 10) can be mapped to slices (although of course this is not generally true). In this case the results are:

%%timeit f = h5py.File('test.h5','r')
f['/test'][0:10000:10]

100 loops, best of 3: 12 ms per loop

This is a great argument to improve the implementation of fancy indexing in h5py, but I would hesitate to conclude "HDF5 is slow".

Author

rossant commented Jan 8, 2016

@andrewcollette thanks for your comment, I've updated the benchmarks and the post accordingly. And thanks for doing h5py! Despite the problems we've had with HDF5, I actually like the h5py API and how it fits so naturally with NumPy.

rossant/benchmark.ipynb

Select an option

No results found

Select an option

No results found

Xarthisius commented Jan 7, 2016

Uh oh!

Xarthisius commented Jan 7, 2016

Uh oh!

rossant commented Jan 7, 2016

Uh oh!

andrewcollette commented Jan 7, 2016

Uh oh!

rossant commented Jan 8, 2016

Uh oh!