Skip to content

Instantly share code, notes, and snippets.

@perrygeo
Created February 9, 2017 23:59
Show Gist options
  • Save perrygeo/05ea2f755d96f01edcf631264c3eb278 to your computer and use it in GitHub Desktop.
Save perrygeo/05ea2f755d96f01edcf631264c3eb278 to your computer and use it in GitHub Desktop.
from concurrent import futures
from datetime import datetime
import requests
def threadmap(func, seq, concurrency=8):
"""Apply function to each item in seq in a concurrent threadpool
Only beneficial for IO-bound tasks because of the GIL
Parameters:
func: function to process single item
seq: iterable of items
Yields:
tuples; (item, func(item))
Note that items are returned along with the result becuase the order
of the returned values is not guaranteed.
"""
with futures.ThreadPoolExecutor(max_workers=concurrency) as executor:
tasks = {executor.submit(func, item): item
for item in seq}
for f in futures.as_completed(tasks):
initem = tasks[f]
if f.exception() is not None:
raise f.exception()
else:
yield initem, f.result()
data = ['https://example.com/?id={}'.format(x) for x in range(100)]
def time_fetch(url):
start = datetime.now()
requests.get(url)
return (datetime.now() - start)
print(sum([t.total_seconds() for _, t in threadmap(time_fetch, data)]))
@perrygeo
Copy link
Author

perrygeo commented Feb 10, 2017

We can use threadmap to time itself.

$ time python threadmap.py
7.772231

real	0m1.124s
user	0m2.026s
sys	0m0.268s

The https responses took a total of almost 8 seconds. But run concurrently, it only took ~1 second of clock time.

@perrygeo
Copy link
Author

Note: requires the futures module on python2 - pip install futures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment