A little experiment to demonstrate Python's use of threads is indeed concurrency and NOT parallelism
The script is designed such that a bunch of random numbers are processed - this is an attempt at avoiding chances of the execution runtime trying to optimize out some values. The final value is also printed - not because it's interesting, but so that there is no chance the runtime optimizes for the fact that a variable remains unused.
When running the experiment and displaying htop
in another screen, it is visible that multi-processing is happening in procs
mode (all CPUs blaze a short while),
whilst when I use threads
mode, the processing offers just some blips here and there as the main python process gets scheduled wherever it can.
I also did an implementation in Go which uses goroutines, and observing the behaviour in htop indicates that these are doing some level of leveraging either multiprocessing or OS threads somehow, as go run gorun.go 10
clearly has all processes firing on all cylinders. Adjust for your own CPU loadout.
Example outputs on a 12-CPU machine:
Multi-processing shows each process taking around 4 seconds
$ python3 throc.py 10 procs
Start ...
p-10: 3.94s (v=6249008669209)
p-9: 3.95s (v=6248862477347)
p-7: 3.96s (v=6250177286888)
p-1: 4.01s (v=6248649473623)
p-8: 4.02s (v=6250077417878)
p-3: 4.04s (v=6252533118171)
p-5: 4.03s (v=6252144942960)
p-6: 4.04s (v=6248147988780)
p-4: 4.04s (v=6249178151022)
p-2: 4.07s (v=6252541851112)
Multi-threading however shows each taking around 10 seconds, as each takes turns for 10 threads.
$ python3 throc.py 10 threads
Start ...
t-8: 11.91s (v=6247859913691)
t-5: 12.52s (v=6250697271959)
t-7: 12.65s (v=6249371957701)
t-1: 13.04s (v=6251484020177)
t-4: 13.09s (v=6247671509492)
t-2: 13.29s (v=6253259321454)
t-9: 13.15s (v=6248279784496)
t-10: 13.19s (v=6250442979962)
t-6: 13.59s (v=6250007390597)
t-3: 13.64s (v=6248431702777)
With regards to having a single process or thread though:
$ python3 throc.py 1 procs
Start ...
p-1: 1.13s (v=6248543853309)
$ python3 throc.py 1 threads
Start ...
t-1: 1.11s (v=6251955885588)
$ python3 throc.py 4 inst
Start ...
i-1: 1.10s (v=6247870992448)
i-2: 1.11s (v=6249785772042)
i-3: 1.12s (v=6250020650060)
i-4: 1.10s (v=6252481604511)
Each process seems to incur an overhead vaguely commensurate with the number of processes running.
Threads however definitely interleave and take as many more times as are threads
Running the inst
mode, plain execution instances with no concurrency, demonstrates the expected base speed of the operation.
Conclusion: python definitely does not benefit from OS threads.