$ pypy --version
Python 2.7.10 (f3ad1e1e1d62, Aug 28 2015, 09:36:42)
[PyPy 2.6.1 with GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)]
numpypy installed via pypy -m pip install git+https://bitbucket.org/pypy/numpy.git
on Sept. 9 2015.
$ python
Python 2.7.10 |Continuum Analytics, Inc.| (default, May 28 2015, 17:04:42)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
conda version 0.20.0
Macbook Pro from Mid-2014 2.5 GHz Intel i7, 16 GB of RAM, OSX 10.9.5
The benchmark is done on a number of core methods in a big numerical calculation that largely involves doing simple scalar operations on multi-dimensional numpy arrays (nested for loops taking summations or products of array elements from one or more arrays). These methods call a number of other smaller helper methods to complete the calculation. The code was originally written to be optimal for Numba so it does not rely on any numpy vectorization or indexing tricks, unrolling all loops explicitly. I am unable to supply the source code
To test PyPy and Python, the numba jit decorators were removed and some arrays that were calculated from the scipy.stats
module were replaced with arrays of identical shape filled with dummy values since PyPy lacks the ability to simply call out to these methods.
All times are listed in milliseconds except for the Cold Start
method, which includes the first call to the methods so should include the time required to jit everything. All individual methods call their functions once in the timeit
setup so that the overhead of jitting the code isn't included in the timing. Times are the best of 3 runs calling the method N
times so that the total runtime is greater than 0.1 seconds.
All times listed in milliseconds except where stated and do not include the cost of jitting the functions except for the Cold Start
method.
Method | Python | Pypy | Numba | Numba speed-up over PyPy |
---|---|---|---|---|
Cold Start | 8.74 s | 2.1 s | 6.2 s | - |
pd3 | 43.7 | 1.46 | 0.06 | 24x |
pd4 | 232.5 | 2.57 | 0.19 | 14x |
pw | 4.67 | 1.16 | 0.026 | 44x |
pe2 | 10.5 | 1.2 | 0.040 | 30x |
pe3 | 195.4 | 37.6 | 0.56 | 67x |
pe4 | 2500 | 463.6 | 6.42 | 72x |
pq2 | 10.6 | 1.25 | 0.15 | 8x |
pq3 | 200.7 | 38.3 | 1.56 | 25x |
pq4 | 2500 | 467.3 | 15.33 | 30x |
pp2 | 10.7 | 1.26 | 0.153 | 8.2x |
pp3 | 196.8 | 40.7 | 1.6 | 25x |
ppq | 199.4 | 40.7 | 1.9 | 21x |
ppa | 2710 | 474.2 | 17.1 | 28x |