Skip to content

Instantly share code, notes, and snippets.

@jpivarski
Last active March 12, 2025 16:23
Show Gist options
  • Save jpivarski/da343abd8024834ee8c5aaba691aafc7 to your computer and use it in GitHub Desktop.
Save jpivarski/da343abd8024834ee8c5aaba691aafc7 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@itamarst
Copy link

My experience is that PyPy doesn't interact well with NumPy. I think this is part of the motivation for the HPy project (https://hpyproject.org/).

@jpivarski
Copy link
Author

Good point... CPython is a little faster when dealing with its own builtin types, too.

Replacing the NumPy data structure with a list of lists (in the hot part of the loop), here's CPython again:

Python 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:33:10) 
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> import numpy as np
>>> def run_python(height, width):
...     y, x = np.ogrid[-1:0:height*1j, -1.5:0:width*1j]
...     c = x + y*1j
...     fractal = np.full(c.shape, 20, dtype=np.int32).tolist()
...     x, y, c = x.tolist(), y.tolist(), c.tolist()
...     for h in range(height):
...         for w in range(width):                  # for each pixel (h, w)...
...             z = c[h][w]
...             for i in range(20):                 # iterate at most 20 times
...                 z = z**2 + c[h][w]              # applying z → z² + c
...                 if abs(z) > 2:                  # if it diverges (|z| > 2)
...                     fractal[h][w] = i           # color the plane with the iteration number
...                     break                       # we're done, no need to keep iterating
...     return fractal
... 
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
1.000359961999493
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.9851951990021917
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.9806630249986483
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.9808432169993466
>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.9776434310006152

and here's pypy:

Python 3.9.18 | packaged by conda-forge | (9c4f8ef1, Mar 08 2024, 07:32:51)
[PyPy 7.3.15 with GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>> import timeit
>>>> import numpy as np
>>>> def run_python(height, width):
....     y, x = np.ogrid[-1:0:height*1j, -1.5:0:width*1j]
....     c = x + y*1j
....     fractal = np.full(c.shape, 20, dtype=np.int32).tolist()
....     x, y, c = x.tolist(), y.tolist(), c.tolist()
....     for h in range(height):
....         for w in range(width):                  # for each pixel (h, w)...
....             z = c[h][w]
....             for i in range(20):                 # iterate at most 20 times
....                 z = z**2 + c[h][w]              # applying z → z² + c
....                 if abs(z) > 2:                  # if it diverges (|z| > 2)
....                     fractal[h][w] = i           # color the plane with the iteration number
....                     break                       # we're done, no need to keep iterating
....     return fractal
.... 
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.6174074949994974
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.6332233270004508
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.7321933960010938
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.6341267679999874
>>>> timeit.timeit(lambda: run_python(200, 300), number=10)
0.6687725560004765

Now we get pypy being about 1.7× faster than CPython, which is in the ballpark of what I'd expect.

Personally, I'm still a lot more swayed by the 200× that you get through other methods. For any numerical work, I'd try to get the operation on numerical data compiled with known types, no boxing, no garbage collectors, and all the rest.

@itamarst
Copy link

itamarst commented Oct 17, 2024

I got intrigued and went and found a dual scalar / handwritten portable SIMD implementation of the Mandelbrot algorithm: https://pythonspeed.com/articles/optimizing-with-simd/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment