jpivarski/mandelbrot-on-all-accelerators.ipynb

Last active March 12, 2025 16:23

Star (34) You must be signed in to star a gist
Fork (6) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/jpivarski/da343abd8024834ee8c5aaba691aafc7.js"></script>
Save jpivarski/da343abd8024834ee8c5aaba691aafc7 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

mandelbrot-on-all-accelerators.ipynb

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Raw

plot-mandelbrot-on-all-accelerators-colab.svg

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

Raw

plot-mandelbrot-on-all-accelerators.svg

Sorry, something went wrong. Reload?

Sorry, we cannot display this file.

Sorry, this file is invalid so it cannot be displayed.

itamarst commented Sep 18, 2024

Makes sense, thank you!

MostAwesomeDude commented Sep 19, 2024

This is missing PyPy numbers.

Author

jpivarski commented Sep 19, 2024

Okay. I installed pypy 7.3.15 and CPython 3.9.18 in two conda environments with NumPy in each and repeated In [2] and In [4] above.

Here is the CPython, which reproduces the order of magnitude above: I find that it's now 145 ms per run_python(200, 300), rather than 249 ms... within a factor of 2, with a different Python version, environment, and who-knows-what since I first ran this test on June 14, 2022.

Python 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:33:10) 
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> import numpy as np
>>> def run_python(height, width):
...     y, x = np.ogrid[-1:0:height*1j, -1.5:0:width*1j]
...     c = x + y*1j
...     fractal = np.full(c.shape, 20, dtype=np.int32)
...     for h in range(height):
...         for w in range(width):                  # for each pixel (h, w)...
...             z = c[h, w]
...             for i in range(20):                 # iterate at most 20 times
...                 z = z**2 + c[h, w]              # applying z → z² + c
...                 if abs(z) > 2:                  # if it diverges (|z| > 2)
...                     fractal[h, w] = i           # color the plane with the iteration number
...                     break                       # we're done, no need to keep iterating
...     return fractal
... 
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
1.442052590999083
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
1.4416049659994314
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
1.445725762998336
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
1.4504199469993182
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
1.4678468480015

Now here's the pypy. You can see that it's matching the same interface-version of Python.

Python 3.9.18 | packaged by conda-forge | (9c4f8ef1, Mar 08 2024, 07:32:51)
[PyPy 7.3.15 with GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import timeit
>>>> import numpy as np
>>>> def run_python(height, width):
....     y, x = np.ogrid[-1:0:height*1j, -1.5:0:width*1j]
....     c = x + y*1j
....     fractal = np.full(c.shape, 20, dtype=np.int32)
....     for h in range(height):
....         for w in range(width):                  # for each pixel (h, w)...
....             z = c[h, w]
....             for i in range(20):                 # iterate at most 20 times
....                 z = z**2 + c[h, w]              # applying z → z² + c
....                 if abs(z) > 2:                  # if it diverges (|z| > 2)
....                     fractal[h, w] = i           # color the plane with the iteration number
....                     break                       # we're done, no need to keep iterating
....     return fractal
.... 
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
8.935209687999304
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
8.972501267999178
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
8.909452853000403
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
8.971896891998767
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
8.998815111001022

The pypy is more than 6× slower than the CPython. I don't know why: I would have thought that pypy's JIT compiler would have made it several times faster than CPython, but nothing like the factor of 200× faster that you can get by compiling without maintaining Python's dynamic programming environment (that is, C++, Numba, JAX...).

itamarst commented Sep 19, 2024

My experience is that PyPy doesn't interact well with NumPy. I think this is part of the motivation for the HPy project (https://hpyproject.org/).

Author

jpivarski commented Sep 19, 2024

Good point... CPython is a little faster when dealing with its own builtin types, too.

Replacing the NumPy data structure with a list of lists (in the hot part of the loop), here's CPython again:

Python 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:33:10) 
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> import numpy as np
>>> def run_python(height, width):
...     y, x = np.ogrid[-1:0:height*1j, -1.5:0:width*1j]
...     c = x + y*1j
...     fractal = np.full(c.shape, 20, dtype=np.int32).tolist()
...     x, y, c = x.tolist(), y.tolist(), c.tolist()
...     for h in range(height):
...         for w in range(width):                  # for each pixel (h, w)...
...             z = c[h][w]
...             for i in range(20):                 # iterate at most 20 times
...                 z = z**2 + c[h][w]              # applying z → z² + c
...                 if abs(z) > 2:                  # if it diverges (|z| > 2)
...                     fractal[h][w] = i           # color the plane with the iteration number
...                     break                       # we're done, no need to keep iterating
...     return fractal
... 
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
1.000359961999493
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.9851951990021917
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.9806630249986483
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.9808432169993466
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.9776434310006152

and here's pypy:

Python 3.9.18 | packaged by conda-forge | (9c4f8ef1, Mar 08 2024, 07:32:51)
[PyPy 7.3.15 with GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import timeit
>>>> import numpy as np
>>>> def run_python(height, width):
....     y, x = np.ogrid[-1:0:height*1j, -1.5:0:width*1j]
....     c = x + y*1j
....     fractal = np.full(c.shape, 20, dtype=np.int32).tolist()
....     x, y, c = x.tolist(), y.tolist(), c.tolist()
....     for h in range(height):
....         for w in range(width):                  # for each pixel (h, w)...
....             z = c[h][w]
....             for i in range(20):                 # iterate at most 20 times
....                 z = z**2 + c[h][w]              # applying z → z² + c
....                 if abs(z) > 2:                  # if it diverges (|z| > 2)
....                     fractal[h][w] = i           # color the plane with the iteration number
....                     break                       # we're done, no need to keep iterating
....     return fractal
.... 
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.6174074949994974
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.6332233270004508
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.7321933960010938
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.6341267679999874
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.6687725560004765

Now we get pypy being about 1.7× faster than CPython, which is in the ballpark of what I'd expect.

Personally, I'm still a lot more swayed by the 200× that you get through other methods. For any numerical work, I'd try to get the operation on numerical data compiled with known types, no boxing, no garbage collectors, and all the rest.

itamarst commented Oct 17, 2024 •

edited

Loading

I got intrigued and went and found a dual scalar / handwritten portable SIMD implementation of the Mandelbrot algorithm: https://pythonspeed.com/articles/optimizing-with-simd/

jpivarski/mandelbrot-on-all-accelerators.ipynb

itamarst commented Sep 18, 2024

MostAwesomeDude commented Sep 19, 2024

jpivarski commented Sep 19, 2024

itamarst commented Sep 19, 2024

jpivarski commented Sep 19, 2024

itamarst commented Oct 17, 2024 • edited Loading

itamarst commented Oct 17, 2024 •

edited

Loading