Skip to content

Instantly share code, notes, and snippets.

@binki
Last active December 30, 2017 04:15
Show Gist options
  • Save binki/7fc39c00b4a9fda2ab876fc59e0f135d to your computer and use it in GitHub Desktop.
Save binki/7fc39c00b4a9fda2ab876fc59e0f135d to your computer and use it in GitHub Desktop.

initial tweet of luceleaftea: https://twitter.com/luceleaftea/status/946938980574089216 followup: https://twitter.com/ohnobinki/status/946944544528007168

Ran them all to see which ones are faster. Some of them have values that are too close to tell if they are actually better or just flucuations in the environment.

ohnobinki@gibby ~/downloads $ for x in $(seq 0 4); do echo -n "doStuff(${x}) "; python3 -m timeit -s 'from blahstar import doStuff' "doStuff(${x})"; done
doStuff(0) 10 loops, best of 3: 211 msec per loop
doStuff(1) 10 loops, best of 3: 179 msec per loop
doStuff(2) 10 loops, best of 3: 179 msec per loop
doStuff(3) 10 loops, best of 3: 193 msec per loop
doStuff(4) 100 loops, best of 3: 4.36 msec per loop
ohnobinki@gibby ~/downloads $ for x in $(seq 0 4); do echo -n "doStuff(${x}) "; python3 -m timeit -s 'from blahstar import doStuff' "doStuff(${x})"; done
doStuff(0) 10 loops, best of 3: 223 msec per loop
doStuff(1) 10 loops, best of 3: 184 msec per loop
doStuff(2) 10 loops, best of 3: 185 msec per loop
doStuff(3) 10 loops, best of 3: 180 msec per loop
doStuff(4) 100 loops, best of 3: 4.22 msec per loop

So the original algorithm’s best run is 211ms. All the modifications of the starmap() per x value roughly get us to 180ms. And starmap() over y values (each worker does a whole row at a time) gets us down to 5ms.

I think we can conclude that the overhead from starmap() is way too much to use per x value. This is because it requires IPC which is expensive and inserts unnecessary amounts of synchronization (wait for all x value workers to finish before proceeding to next y value).

#!/usr/bin/env python3
import multiprocessing.pool
from math import sin
import timeit
# I’m not good at making noise xD
def generateNoiseValue(nz, sizeX, sizeY, x, y, redistPower):
return redistPower * (sin(x/sizeX) + sin(y/sizeY)) + nz
def generateNoiseValueRow(nz, sizeX, sizeY, y, redistPower):
return [generateNoiseValue(nz, sizeX, sizeY, x, y, redistPower) for x in range(0, sizeX)]
pool = multiprocessing.pool.Pool()
def doStuff(strategy):
elevation = []
sizeX = 100
sizeY = 100
nz = 23
redistribPower = 2
# Original strategy. Extra indirection via immediately invoked lambda.
if strategy == 0:
for y in range(0, sizeY):
elevation.append(pool.starmap(generateNoiseValue, [(lambda x: (nz, sizeX, sizeY, x, y, redistribPower))(x) for x in range(0, sizeX)]))
# My first suggestion: use generator and no lambda.
elif strategy == 1:
for y in range(0, sizeY):
elevation.append(pool.starmap(generateNoiseValue, ((nz, sizeX, sizeY, x, y, redistribPower) for x in range(0, sizeX))))
# Tyler’s response: at least in his codebase, list comprehension is faster than generators.
# This might be true, but it’s hard to tell because the run times are too close.
elif strategy == 2:
for y in range(0, sizeY):
elevation.append(pool.starmap(generateNoiseValue, [(nz, sizeX, sizeY, x, y, redistribPower) for x in range(0, sizeX)]))
# Replace the outer for loop with a list comprehension because those are “slightly faster”
elif strategy == 3:
elevation = [pool.starmap(generateNoiseValue, [(nz, sizeX, sizeY, x, y, redistribPower) for x in range(0, sizeX)]) for y in range(0, sizeY)]
# starmap() per y value instead of per x value. Thus we send chunks of work and get
# chunks back. Much faster and noticeable because generateNoiseValue() is very fast.
elif strategy == 4:
elevation = pool.starmap(generateNoiseValueRow, [(nz, sizeX, sizeY, y, redistribPower) for y in range(0, sizeY)])
else:
raise 'unknown strategy'
if len(elevation) != sizeY:
raise 'wrong number of rows!'
for e in elevation:
if len(e) != sizeX:
raise 'row has wrong length!'
return elevation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment