Last active
April 15, 2022 11:55
-
-
Save jfpuget/b53f1e15a37aba5944ad to your computer and use it in GitHub Desktop.
An exercise in Python optimization: make Python benchmarks as fast, if not faster, than Julia.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# A Python Performance Optimization Exercise\n", | |
"\n", | |
"## Author [Jean-François Puget](https://www.ibm.com/developerworks/community/blogs/jfp/?lang=en)\n", | |
"\n", | |
"A revisit of the Python micro benchmarks available at https://github.com/JuliaLang/julia/blob/master/test/perf/micro/perf.py\n", | |
"\n", | |
"The code used in this notebook is discussed at length in [Python Meets Julia Micro Performance](https://www.ibm.com/developerworks/community/blogs/jfp/entry/python_meets_julia_micro_performance)\n", | |
"\n", | |
"Timings below are for Anaconda 64 bits with Python 3.5.1 and Numba 0.23.1 running on a Windows laptop (Lenovo W520). Timings on another machine with another Python version can be quite different." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First, let's load useful extensions and import useful modules" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%load_ext Cython\n", | |
"%load_ext line_profiler\n", | |
"\n", | |
"import random\n", | |
"import numpy as np\n", | |
"from numba import jit\n", | |
"import sys\n", | |
"\n", | |
"if sys.version_info < (3,):\n", | |
" range = xrange" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Fibonacci" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Naive code\n", | |
"\n", | |
"The base line is gthe one used in Julia benchmarks. It is a straightforward, frecursive implementation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def fib(n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib(n-1)+fib(n-2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Sanity check." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"assert fib(20) == 6765" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Cython\n", | |
"\n", | |
"Let's try various versions." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%cython\n", | |
"def fib_cython(n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib_cython(n-1)+fib_cython(n-2)\n", | |
"\n", | |
"cpdef inline fib_cython_inline(n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib_cython_inline(n-1)+fib_cython_inline(n-2)\n", | |
"\n", | |
"\n", | |
"cpdef long fib_cython_type(long n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib_cython_type(n-1)+fib_cython_type(n-2)\n", | |
"\n", | |
"cpdef long fib_cython_type_inline(long n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib_cython_type_inline(n-1)+fib_cython_type_inline(n-2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Sanity checks" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"assert fib_cython(20) == 6765\n", | |
"assert fib_cython_inline(20) == 6765\n", | |
"assert fib_cython_type(20) == 6765\n", | |
"assert fib_cython_type_inline(20) == 6765" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Numba\n", | |
"\n", | |
"Let's try three versions, with and without type." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from numba import int32, int64\n", | |
"@jit\n", | |
"def fib_numba(n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib_numba(n-1)+fib_numba(n-2)\n", | |
"\n", | |
"@jit(int32(int32))\n", | |
"def fib_numba_type32(n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib_numba_type32(n-1)+fib_numba_type32(n-2)\n", | |
"\n", | |
"\n", | |
"@jit(int64(int64))\n", | |
"def fib_numba_type64(n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib_numba_type64(n-1)+fib_numba_type64(n-2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Sanity check" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"assert fib_numba(20) == 6765\n", | |
"assert fib_numba_type32(20) == 6765\n", | |
"assert fib_numba_type64(20) == 6765" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Using floats" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def ffib(n):\n", | |
" if n<2.0:\n", | |
" return n\n", | |
" return ffib(n-1)+ffib(n-2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Timings" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"100 loops, best of 3: 3.77 ms per loop\n", | |
"1000 loops, best of 3: 1.47 ms per loop\n", | |
"1000 loops, best of 3: 759 µs per loop\n", | |
"10000 loops, best of 3: 24.4 µs per loop\n", | |
"10000 loops, best of 3: 24.6 µs per loop\n", | |
"100 loops, best of 3: 4.89 ms per loop\n", | |
"100 loops, best of 3: 5 ms per loop\n", | |
"100 loops, best of 3: 5.01 ms per loop\n", | |
"100 loops, best of 3: 5.26 ms per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit fib(20)\n", | |
"%timeit fib_cython(20)\n", | |
"%timeit fib_cython_inline(20)\n", | |
"%timeit fib_cython_type(20)\n", | |
"%timeit fib_cython_type_inline(20)\n", | |
"%timeit fib_numba(20)\n", | |
"%timeit fib_numba_type32(20)\n", | |
"%timeit fib_numba_type64(20)\n", | |
"%timeit ffib(20.0)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": false | |
}, | |
"source": [ | |
"Let's try a larger one." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1 loops, best of 3: 442 ms per loop\n", | |
"10 loops, best of 3: 178 ms per loop\n", | |
"10 loops, best of 3: 94.2 ms per loop\n", | |
"100 loops, best of 3: 3.05 ms per loop\n", | |
"100 loops, best of 3: 3.01 ms per loop\n", | |
"1 loops, best of 3: 600 ms per loop\n", | |
"1 loops, best of 3: 648 ms per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"n = 30\n", | |
"%timeit fib(n)\n", | |
"%timeit fib_cython(n)\n", | |
"%timeit fib_cython_inline(n)\n", | |
"%timeit fib_cython_type(n)\n", | |
"%timeit fib_cython_type_inline(n)\n", | |
"%timeit fib_numba(n)\n", | |
"%timeit ffib(n*1.0)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Sequential Fibonacci\n", | |
"\n", | |
"The recursive call is not the best way to compute Fibonacci numbers. It is better to accumulate them as follows." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Functools" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 51, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from functools import lru_cache as cache\n", | |
"\n", | |
"@cache(maxsize=None)\n", | |
"def fib_cache(n):\n", | |
" if n<2:\n", | |
" return n\n", | |
" return fib_cache(n-1)+fib_cache(n-2)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Plain Python" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def fib_seq(n):\n", | |
" if n < 2:\n", | |
" return n\n", | |
" a,b = 1,0\n", | |
" for i in range(n-1):\n", | |
" a,b = a+b,a\n", | |
" return a " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Numba" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"@jit\n", | |
"def fib_seq_numba(n):\n", | |
" if n < 2:\n", | |
" return n\n", | |
" a,b = 1,0\n", | |
" for i in range(n-1):\n", | |
" a,b = a+b,a\n", | |
" return a " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Cython" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%cython\n", | |
"\n", | |
"def fib_seq_cython(n):\n", | |
" if n < 2:\n", | |
" return n\n", | |
" a,b = 1,0\n", | |
" for i in range(n-1):\n", | |
" a,b = a+b,a\n", | |
" return a \n", | |
"\n", | |
"cpdef long fib_seq_cython_type(long n):\n", | |
" if n < 2:\n", | |
" return n\n", | |
" cdef long a,b\n", | |
" a,b = 1,0\n", | |
" for i in range(n-1):\n", | |
" a,b = a+b,a\n", | |
" return a " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Sanity checks" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"assert fib_seq(20) == 6765\n", | |
"assert fib_seq_cython(20) == 6765\n", | |
"assert fib_seq_cython_type(20) == 6765\n", | |
"assert fib_seq_numba(20) == 6765" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Timings" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 52, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"The slowest run took 188.04 times longer than the fastest. This could mean that an intermediate result is being cached \n", | |
"10000000 loops, best of 3: 127 ns per loop\n", | |
"1000000 loops, best of 3: 1.81 µs per loop\n", | |
"The slowest run took 4.46 times longer than the fastest. This could mean that an intermediate result is being cached \n", | |
"1000000 loops, best of 3: 959 ns per loop\n", | |
"The slowest run took 15.66 times longer than the fastest. This could mean that an intermediate result is being cached \n", | |
"10000000 loops, best of 3: 82 ns per loop\n", | |
"The slowest run took 31.77 times longer than the fastest. This could mean that an intermediate result is being cached \n", | |
"1000000 loops, best of 3: 216 ns per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit fib_cache(20)\n", | |
"%timeit fib_seq(20)\n", | |
"%timeit fib_seq_cython(20)\n", | |
"%timeit fib_seq_cython_type(20)\n", | |
"%timeit fib_seq_numba(20)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Arbitrary precision check" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"354224848179261915075" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"fib_seq(100)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Quicksort" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Pure Python\n", | |
"\n", | |
"Code used in Julia benchmarks" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def qsort_kernel(a, lo, hi):\n", | |
" i = lo\n", | |
" j = hi\n", | |
" while i < hi:\n", | |
" pivot = a[(lo+hi) // 2]\n", | |
" while i <= j:\n", | |
" while a[i] < pivot:\n", | |
" i += 1\n", | |
" while a[j] > pivot:\n", | |
" j -= 1\n", | |
" if i <= j:\n", | |
" a[i], a[j] = a[j], a[i]\n", | |
" i += 1\n", | |
" j -= 1\n", | |
" if lo < j:\n", | |
" qsort_kernel(a, lo, j)\n", | |
" lo = i\n", | |
" j = hi\n", | |
" return a\n", | |
"\n", | |
"\n", | |
"def benchmark_qsort():\n", | |
" lst = [ random.random() for i in range(1,5000) ]\n", | |
" qsort_kernel(lst, 0, len(lst)-1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Numba" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"@jit\n", | |
"def qsort_kernel_numba(a, lo, hi):\n", | |
" i = lo\n", | |
" j = hi\n", | |
" while i < hi:\n", | |
" pivot = a[(lo+hi) // 2]\n", | |
" while i <= j:\n", | |
" while a[i] < pivot:\n", | |
" i += 1\n", | |
" while a[j] > pivot:\n", | |
" j -= 1\n", | |
" if i <= j:\n", | |
" a[i], a[j] = a[j], a[i]\n", | |
" i += 1\n", | |
" j -= 1\n", | |
" if lo < j:\n", | |
" qsort_kernel_numba(a, lo, j)\n", | |
" lo = i\n", | |
" j = hi\n", | |
" return a\n", | |
"\n", | |
"def benchmark_qsort_numba():\n", | |
" lst = [ random.random() for i in range(1,5000) ]\n", | |
" qsort_kernel_numba(lst, 0, len(lst)-1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"With Numpy" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"@jit\n", | |
"def qsort_kernel_numba_numpy(a, lo, hi):\n", | |
" i = lo\n", | |
" j = hi\n", | |
" while i < hi:\n", | |
" pivot = a[(lo+hi) // 2]\n", | |
" while i <= j:\n", | |
" while a[i] < pivot:\n", | |
" i += 1\n", | |
" while a[j] > pivot:\n", | |
" j -= 1\n", | |
" if i <= j:\n", | |
" a[i], a[j] = a[j], a[i]\n", | |
" i += 1\n", | |
" j -= 1\n", | |
" if lo < j:\n", | |
" qsort_kernel_numba_numpy(a, lo, j)\n", | |
" lo = i\n", | |
" j = hi\n", | |
" return a\n", | |
"\n", | |
"def benchmark_qsort_numba_numpy():\n", | |
" lst = np.random.rand(5000)\n", | |
" qsort_kernel_numba(lst, 0, len(lst)-1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Cython" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%cython\n", | |
"import numpy as np\n", | |
"import cython\n", | |
"\n", | |
"@cython.boundscheck(False)\n", | |
"@cython.wraparound(False)\n", | |
"cdef double[:] \\\n", | |
"qsort_kernel_cython_numpy_type(double[:] a, \\\n", | |
" long lo, \\\n", | |
" long hi):\n", | |
" cdef: \n", | |
" long i, j\n", | |
" double pivot\n", | |
" i = lo\n", | |
" j = hi\n", | |
" while i < hi:\n", | |
" pivot = a[(lo+hi) // 2]\n", | |
" while i <= j:\n", | |
" while a[i] < pivot:\n", | |
" i += 1\n", | |
" while a[j] > pivot:\n", | |
" j -= 1\n", | |
" if i <= j:\n", | |
" a[i], a[j] = a[j], a[i]\n", | |
" i += 1\n", | |
" j -= 1\n", | |
" if lo < j:\n", | |
" qsort_kernel_cython_numpy_type(a, lo, j)\n", | |
" lo = i\n", | |
" j = hi\n", | |
" return a\n", | |
"\n", | |
"def benchmark_qsort_numpy_cython():\n", | |
" lst = np.random.rand(5000)\n", | |
" qsort_kernel_cython_numpy_type(lst, 0, len(lst)-1)\n", | |
" " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"10 loops, best of 3: 17.7 ms per loop\n", | |
"The slowest run took 4.06 times longer than the fastest. This could mean that an intermediate result is being cached \n", | |
"1 loops, best of 3: 79.4 ms per loop\n", | |
"The slowest run took 12.62 times longer than the fastest. This could mean that an intermediate result is being cached \n", | |
"1 loops, best of 3: 20.9 ms per loop\n", | |
"1000 loops, best of 3: 772 µs per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit benchmark_qsort()\n", | |
"%timeit benchmark_qsort_numba()\n", | |
"%timeit benchmark_qsort_numba_numpy()\n", | |
"%timeit benchmark_qsort_numpy_cython()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Built-in sort" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def benchmark_sort():\n", | |
" lst = [ random.random() for i in range(1,5000) ]\n", | |
" lst.sort()\n", | |
"\n", | |
"def benchmark_sort_numpy():\n", | |
" lst = np.random.rand(5000)\n", | |
" np.sort(lst)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 23, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1000 loops, best of 3: 2 ms per loop\n", | |
"1000 loops, best of 3: 306 µs per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit benchmark_sort()\n", | |
"%timeit benchmark_sort_numpy()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Random mat stat" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Baseline\n", | |
"\n", | |
"Used in Julia benchmarks" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 24, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def randmatstat(t):\n", | |
" n = 5\n", | |
" v = np.zeros(t)\n", | |
" w = np.zeros(t)\n", | |
" for i in range(1,t):\n", | |
" a = np.random.randn(n, n)\n", | |
" b = np.random.randn(n, n)\n", | |
" c = np.random.randn(n, n)\n", | |
" d = np.random.randn(n, n)\n", | |
" P = np.matrix(np.hstack((a, b, c, d)))\n", | |
" Q = np.matrix(np.vstack((np.hstack((a, b)), np.hstack((c, d)))))\n", | |
" v[i] = np.trace(np.linalg.matrix_power(np.transpose(P)*P, 4))\n", | |
" w[i] = np.trace(np.linalg.matrix_power(np.transpose(Q)*Q, 4))\n", | |
" return (np.std(v)/np.mean(v), np.std(w)/np.mean(w))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Speeding it up" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def my_power(a):\n", | |
" a = np.dot(a.T,a)\n", | |
" a = np.dot(a,a)\n", | |
" a = np.dot(a,a)\n", | |
" return a\n", | |
" \n", | |
"def randmatstat_fast(t):\n", | |
" n = 5\n", | |
" v = np.zeros(t)\n", | |
" w = np.zeros(t)\n", | |
" for i in range(1,t):\n", | |
" a = np.random.randn(n, n)\n", | |
" b = np.random.randn(n, n)\n", | |
" c = np.random.randn(n, n)\n", | |
" d = np.random.randn(n, n)\n", | |
" P = np.hstack((a, b, c, d))\n", | |
" Q = np.vstack((np.hstack((a, b)), np.hstack((c, d))))\n", | |
" v[i] = np.trace(my_power(P))\n", | |
" w[i] = np.trace(my_power(Q))\n", | |
" return (np.std(v)/np.mean(v), np.std(w)/np.mean(w))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Timings" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"10 loops, best of 3: 160 ms per loop\n", | |
"10 loops, best of 3: 83.2 ms per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit randmatstat(1000)\n", | |
"%timeit randmatstat_fast(1000)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Randmatmul" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def randmatmul(n):\n", | |
" A = np.random.rand(n,n)\n", | |
" B = np.random.rand(n,n)\n", | |
" return np.dot(A,B)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"10 loops, best of 3: 83.7 ms per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit randmatmul(1000)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Mandelbrot" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Baseline\n", | |
"The baseline used in Julia benchmarks" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 29, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"def mandel(z):\n", | |
" maxiter = 80\n", | |
" c = z\n", | |
" for n in range(maxiter):\n", | |
" if abs(z) > 2:\n", | |
" return n\n", | |
" z = z*z + c\n", | |
" return maxiter\n", | |
"\n", | |
"def mandelperf():\n", | |
" r1 = np.linspace(-2.0, 0.5, 26)\n", | |
" r2 = np.linspace(-1.0, 1.0, 21)\n", | |
" return [mandel(complex(r, i)) for r in r1 for i in r2]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 30, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"assert sum(mandelperf()) == 14791" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Numba" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 31, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"@jit\n", | |
"def mandel_numba(z):\n", | |
" maxiter = 80\n", | |
" c = z\n", | |
" for n in range(maxiter):\n", | |
" if abs(z) > 2:\n", | |
" return n\n", | |
" z = z*z + c\n", | |
" return maxiter\n", | |
"\n", | |
"def mandelperf_numba():\n", | |
" r1 = np.linspace(-2.0, 0.5, 26)\n", | |
" r2 = np.linspace(-1.0, 1.0, 21)\n", | |
" c3 = [complex(r,i) for r in r1 for i in r2]\n", | |
" return [mandel_numba(c) for c in c3]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 32, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"assert sum(mandelperf_numba()) == 14791" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Cython" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 33, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%cython\n", | |
"\n", | |
"import numpy as np\n", | |
"cimport numpy as np\n", | |
"\n", | |
"cdef long mandel_cython_type(np.complex z):\n", | |
" cdef long maxiter, n\n", | |
" cdef np.complex c\n", | |
" maxiter = 80\n", | |
" c = z\n", | |
" for n in range(maxiter):\n", | |
" if abs(z) > 2:\n", | |
" return n\n", | |
" z = z*z + c\n", | |
" return maxiter\n", | |
"\n", | |
"cpdef mandelperf_cython_type():\n", | |
" cdef double[:] r1,r2\n", | |
" r1 = np.linspace(-2.0, 0.5, 26)\n", | |
" r2 = np.linspace(-1.0, 1.0, 21)\n", | |
" return [mandel_cython_type(complex(r, i)) for r in r1 for i in r2]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 34, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"assert sum(mandelperf_cython_type()) == 14791" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 35, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"100 loops, best of 3: 6.57 ms per loop\n", | |
"100 loops, best of 3: 3.6 ms per loop\n", | |
"1000 loops, best of 3: 481 µs per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit mandelperf()\n", | |
"%timeit mandelperf_cython_type()\n", | |
"%timeit mandelperf_numba()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Profiling" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"%lprun -s -f mandelperf_numba mandelperf_numba()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Numpy" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 53, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"@jit\n", | |
"def mandel_numba(z):\n", | |
" maxiter = 80\n", | |
" c = z\n", | |
" for n in range(maxiter):\n", | |
" if abs(z) > 2:\n", | |
" return n\n", | |
" z = z*z + c\n", | |
" return maxiter\n", | |
"\n", | |
"@jit\n", | |
"def mandelperf_numba_mesh():\n", | |
" width = 26\n", | |
" height = 21\n", | |
" r1 = np.linspace(-2.0, 0.5, width)\n", | |
" r2 = np.linspace(-1.0, 1.0, height)\n", | |
" mandel_set = np.empty((width,height), dtype=int)\n", | |
" for i in range(width):\n", | |
" for j in range(height):\n", | |
" mandel_set[i,j] = mandel_numba(r1[i] + 1j*r2[j])\n", | |
" return mandel_set" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 54, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"assert np.sum(mandelperf_numba_mesh()) == 14791" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 55, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"10000 loops, best of 3: 126 µs per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit mandelperf_numba_mesh()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"An even faster code is presented in [How To Quickly Compute The Mandelbrot Set In Python](https://www.ibm.com/developerworks/community/blogs/jfp/entry/How_To_Compute_Mandelbrodt_Set_Quickly?lang=en)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# Pisum" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def pisum(t):\n", | |
" sum = 0.0\n", | |
" for j in range(1, 501):\n", | |
" sum = 0.0\n", | |
" for k in range(1, t+1):\n", | |
" sum += 1.0/(k*k)\n", | |
" return sum\n", | |
"\n", | |
"@jit\n", | |
"def pisum_numba(t):\n", | |
" sum = 0.0\n", | |
" for j in range(1, 501):\n", | |
" sum = 0.0\n", | |
" for k in range(1, t+1):\n", | |
" sum += 1.0/(k*k)\n", | |
" return sum\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 44, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%cython\n", | |
"\n", | |
"import cython\n", | |
"\n", | |
"@cython.boundscheck(False)\n", | |
"@cython.wraparound(False)\n", | |
"cpdef double pisum_cython(int t):\n", | |
" cdef:\n", | |
" double sum = 0.0\n", | |
" int j,k\n", | |
" for j in range(1, 501):\n", | |
" sum = 0.0\n", | |
" for k in range(1, t+1):\n", | |
" sum += 1.0/(k*k)\n", | |
" return sum" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 45, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"assert abs(pisum(10000)-1.644834071848065) < 1e-6\n", | |
"assert abs(pisum_numba(10000)-1.644834071848065) < 1e-6\n", | |
"assert abs(pisum_cython(10000)-1.644834071848065) < 1e-6" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 46, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1 loops, best of 3: 926 ms per loop\n", | |
"100 loops, best of 3: 20.4 ms per loop\n", | |
"10 loops, best of 3: 38.2 ms per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit pisum(10000)\n", | |
"%timeit pisum_numba(10000)\n", | |
"%timeit pisum_cython(10000)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Parse int" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def parse_int():\n", | |
" for i in range(1,1000):\n", | |
" n = random.randint(0,2**32-1)\n", | |
" s = hex(n)\n", | |
" \n", | |
" # required for Python 2.7 on 64 bits machine\n", | |
" if s[-1]=='L':\n", | |
" s = s[0:-1]\n", | |
" \n", | |
" m = int(s,16)\n", | |
" assert m == n\n", | |
" \n", | |
"def parse_int_numpy():\n", | |
" for i in range(1,1000):\n", | |
" n = np.random.randint(0,2**31-1)\n", | |
" s = hex(n)\n", | |
" \n", | |
" # required for Python 2.7 on 64 bits machine\n", | |
" if s[-1]=='L':\n", | |
" s = s[0:-1]\n", | |
" \n", | |
" m = int(s,16)\n", | |
" assert m == n\n", | |
" \n", | |
"def parse_int_opt():\n", | |
" for i in range(1,1000):\n", | |
" n = random.randint(0,2**32-1)\n", | |
" s = hex(n)\n", | |
" \n", | |
" # required for Python 2.7 on 64 bits machine\n", | |
" if s.endswith('L'):\n", | |
" s = s[0:-1]\n", | |
" \n", | |
" m = int(s,16)\n", | |
" assert m == n\n", | |
" \n", | |
"def parse_int1():\n", | |
" for i in range(1,1000):\n", | |
" n = random.randint(0,2**32-1)\n", | |
" s = hex(n)\n", | |
" m = int(s,16)\n", | |
" assert m == n\n", | |
" \n", | |
"def parse_int1_numpy():\n", | |
" for i in range(1,1000):\n", | |
" n = np.random.randint(0,2**31-1)\n", | |
" s = hex(n)\n", | |
" m = int(s,16)\n", | |
" assert m == n\n", | |
" \n", | |
"@jit\n", | |
"def parse_numba():\n", | |
" for i in range(1,1000):\n", | |
" n = random.randint(0,2**32-1)\n", | |
" s = hex(n)\n", | |
" m = int(s,16)\n", | |
" assert m == n\n", | |
"\n", | |
"@jit\n", | |
"def parse_numba_numpy():\n", | |
" for i in range(1,1000):\n", | |
" n = np.random.randint(0,2**31-1)\n", | |
" s = hex(n)\n", | |
" m = int(s,16)\n", | |
" assert m == n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"Int in C are up to 2^31-1, hence we must use this as the limit when using Cython or Numpy" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%cython\n", | |
"import random\n", | |
"import cython\n", | |
"cimport cython\n", | |
"import numpy as np\n", | |
"cimport numpy as np\n", | |
"\n", | |
"@cython.boundscheck(False)\n", | |
"@cython.wraparound(False)\n", | |
"cpdef parse_int_cython():\n", | |
" cdef:\n", | |
" int i,n,m\n", | |
" for i in range(1,1000):\n", | |
" n = random.randint(0,2**31-1)\n", | |
" m = int(hex(n),16)\n", | |
" assert m == n\n", | |
" \n", | |
"@cython.boundscheck(False)\n", | |
"@cython.wraparound(False)\n", | |
"cpdef parse_int_cython_numpy():\n", | |
" cdef:\n", | |
" int i,n,m\n", | |
" for i in range(1,1000):\n", | |
" n = np.random.randint(0,2**31-1)\n", | |
" m = int(hex(n),16)\n", | |
" assert m == n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 22, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"100 loops, best of 3: 3.55 ms per loop\n", | |
"1000 loops, best of 3: 1.21 ms per loop\n", | |
"100 loops, best of 3: 3.72 ms per loop\n", | |
"100 loops, best of 3: 3.32 ms per loop\n", | |
"1000 loops, best of 3: 1.05 ms per loop\n", | |
"100 loops, best of 3: 3.63 ms per loop\n", | |
"1000 loops, best of 3: 1.13 ms per loop\n", | |
"100 loops, best of 3: 3.07 ms per loop\n", | |
"1000 loops, best of 3: 817 µs per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit parse_int()\n", | |
"%timeit parse_int_numpy()\n", | |
"%timeit parse_int_opt()\n", | |
"%timeit parse_int1()\n", | |
"%timeit parse_int1_numpy()\n", | |
"%timeit parse_numba()\n", | |
"%timeit parse_numba_numpy()\n", | |
"%timeit parse_int_cython()\n", | |
"%timeit parse_int_cython_numpy()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Vectorized operation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 61, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"%lprun -s -f parse_int parse_int()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 25, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"def parse_int_vec():\n", | |
" n = np.random.randint(2^31-1,size=1000)\n", | |
" for i in range(1,1000):\n", | |
" ni = n[i]\n", | |
" s = hex(ni)\n", | |
" m = int(s,16)\n", | |
" assert m == ni\n", | |
" \n", | |
"@jit\n", | |
"def parse_int_vec_numba():\n", | |
" n = np.random.randint(2^31-1,size=1000)\n", | |
" for i in range(1,1000):\n", | |
" ni = n[i]\n", | |
" s = hex(ni)\n", | |
" m = int(s,16)\n", | |
" assert m == ni" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"vhex = np.vectorize(hex)\n", | |
"vint = np.vectorize(int)\n", | |
"\n", | |
"def parse_int_numpy():\n", | |
" n = np.random.randint(0,2**31-1,1000)\n", | |
" s = vhex(n)\n", | |
" m = vint(s,16)\n", | |
" np.all(m == n)\n", | |
" return s\n", | |
"\n", | |
"@jit\n", | |
"def parse_int_numpy_numba():\n", | |
" n = np.random.randint(0,2**31-1,1000)\n", | |
" s = vhex(n)\n", | |
" m = vint(s,16)\n", | |
" np.all(m == n)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 27, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%cython\n", | |
"import numpy as np\n", | |
"import cython\n", | |
"cimport numpy\n", | |
"cimport cython\n", | |
"\n", | |
"@cython.boundscheck(False)\n", | |
"@cython.wraparound(False)\n", | |
"cpdef parse_int_vec_cython():\n", | |
" cdef:\n", | |
" int i,m\n", | |
" int[:] n\n", | |
" n = np.random.randint(0,2**31-1,1000)\n", | |
" for i in range(1,1000):\n", | |
" m = int(hex(n[i]),16)\n", | |
" assert m == n[i]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 28, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"1000 loops, best of 3: 781 µs per loop\n", | |
"The slowest run took 195.93 times longer than the fastest. This could mean that an intermediate result is being cached \n", | |
"1000 loops, best of 3: 824 µs per loop\n", | |
"1000 loops, best of 3: 733 µs per loop\n", | |
"The slowest run took 104.18 times longer than the fastest. This could mean that an intermediate result is being cached \n", | |
"1000 loops, best of 3: 739 µs per loop\n", | |
"1000 loops, best of 3: 471 µs per loop\n" | |
] | |
} | |
], | |
"source": [ | |
"%timeit parse_int_vec()\n", | |
"%timeit parse_int_vec_numba()\n", | |
"%timeit parse_int_numpy()\n", | |
"%timeit parse_int_numpy_numba()\n", | |
"%timeit parse_int_vec_cython()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"collapsed": true | |
}, | |
"source": [ | |
"That's it!" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment