initial tweet of luceleaftea: https://twitter.com/luceleaftea/status/946938980574089216 followup: https://twitter.com/ohnobinki/status/946944544528007168
Ran them all to see which ones are faster. Some of them have values that are too close to tell if they are actually better or just flucuations in the environment.
ohnobinki@gibby ~/downloads $ for x in $(seq 0 4); do echo -n "doStuff(${x}) "; python3 -m timeit -s 'from blahstar import doStuff' "doStuff(${x})"; done
doStuff(0) 10 loops, best of 3: 211 msec per loop
doStuff(1) 10 loops, best of 3: 179 msec per loop
doStuff(2) 10 loops, best of 3: 179 msec per loop
doStuff(3) 10 loops, best of 3: 193 msec per loop
doStuff(4) 100 loops, best of 3: 4.36 msec per loop
ohnobinki@gibby ~/downloads $ for x in $(seq 0 4); do echo -n "doStuff(${x}) "; python3 -m timeit -s 'from blahstar import doStuff' "doStuff(${x})"; done
doStuff(0) 10 loops, best of 3: 223 msec per loop
doStuff(1) 10 loops, best of 3: 184 msec per loop
doStuff(2) 10 loops, best of 3: 185 msec per loop
doStuff(3) 10 loops, best of 3: 180 msec per loop
doStuff(4) 100 loops, best of 3: 4.22 msec per loop
So the original algorithm’s best run is 211ms. All the modifications of the starmap()
per x value roughly get us to 180ms. And starmap()
over y values (each worker does a whole row at a time) gets us down to 5ms.
I think we can conclude that the overhead from starmap()
is way too much to use per x value. This is because it requires IPC which is expensive and inserts unnecessary amounts of synchronization (wait for all x value workers to finish before proceeding to next y value).