-
-
Save achidlow/c48c8dd3bbf132bd59806911ed387c6a to your computer and use it in GitHub Desktop.
""" | |
Script to download all the books in a humble bundle. | |
May work for other resources, but don't have anything to test against. | |
To use, run from the directory you want to download the books in. | |
Pass the "game" key as the first argument (look in the URL of your normal download page). | |
To restrict to certain formats, pass them as extra positional arguments on the command line. | |
Example: | |
python humble_bundle_download abcdef12345 mobi pdf | |
If no formats are passed, then all will be downloaded. | |
After this you'll have a new directory will all the books downloaded in the selected formats. | |
As written this script requires Python >= 3.6 due to use of f-strings. | |
Should be trivial to convert to other versions. | |
Thanks to https://www.schiff.io/projects/humble-bundle-api for discovering API endpoints. | |
Although that page mentions the API call we use requiring login, it worked without it | |
for me in the one case I've used it for. YMMV. | |
""" | |
import json | |
import os | |
import sys | |
import requests | |
from concurrent.futures import ThreadPoolExecutor, wait | |
def queue_downloads(game_key, *formats): | |
formats = {f.lower() for f in formats} | |
api_url = f'https://hr-humblebundle.appspot.com/api/v1/order/{game_key}' | |
response = requests.get(api_url) | |
response.raise_for_status() | |
data = json.loads(response.text) | |
bundle_name = data['product']['machine_name'] | |
dirname = os.path.join('.', bundle_name) | |
try: | |
os.mkdir(dirname) | |
except FileExistsError: | |
pass | |
futures = [] | |
with ThreadPoolExecutor() as executor: | |
for product in data['subproducts']: | |
base = product['human_name'] | |
formats_to_urls = { | |
dl_struct['name'].lower(): dl_struct['url']['web'] | |
for download in product['downloads'] | |
for dl_struct in download['download_struct'] | |
} | |
if not formats_to_urls: | |
print(f'Warning! Not downloads found for {base}...?') | |
continue | |
dl_data = { | |
url: os.path.join(dirname, f'{base}.{fmt}') | |
for fmt, url in formats_to_urls.items() if (not formats or fmt in formats) | |
} | |
if not dl_data: | |
print(f'Warning! Not downloading {base} due to no acceptable formats.') | |
continue | |
futures.extend([executor.submit(do_download, *args) for args in dl_data.items()]) | |
wait(futures) | |
def do_download(url, out_path): | |
r = requests.get(url) | |
r.raise_for_status() | |
with open(out_path, 'wb') as fd: | |
fd.write(r.content) | |
if __name__ == '__main__': | |
queue_downloads(*sys.argv[1:]) |
Thanks a million. Works a treat.
Very useful, thanks!
However I do have a few comments:
IT APPEARS TO BE A SEVERE MEMORY HOG FOR LARGE BUNDLES!
When I checked why my system was suddenly slowing down, python was using over 2GB of memory and thus I had to start swapping. So I closed a few unneeded applications, hoping that it'd be soon done, but before I could finish that, it crashed the whole system. Not cool.
Is the open
in line 69 missing a close?
For those not familiar with python but having already installed some random specific version that is not the 3.6 needed by this script, you can simply specify to use 3.6 with python3.6 humble_bundle_download XXkexXX
.
And how do you download videos with it (shipped as zip files)? Specifying zip
does not seem to work, but running it without any format specifiers seems to grab them, but this ends just in crashing the whole system because of memory constraints.
(Ironically, I was using this to download a bunch of books about python, so I already learned somethin…)
fantastic-- worked like a charm. thank you. 💯
@bobschi sorry for the late reply, didn't realise there's no notifications for gist comments. Feel free to do anything you want with script, if you're feeling kind you can add an attribution.
@philroche @dagrha - thanks! Glad it helped.
@joha1 haha that's kind of funny. No, the with open(...)
is called a context manager - it ensures close()
is called no matter what. The downloads are running concurrently, and Python isn't very good at releasing memory, so if you're downloading videos then I wouldn't be surprised about it using 2GB of memory. The default ThreadPoolExecutor launches 5 * multiprocessing.cpu_count()
threads, you could reduce the memory by using less threads here.
I'm getting a syntax error:
File "humble_bundle_download.py", line 33 api_url = f'https://hr-humblebundle.appspot.com/api/v1/order/{game_key}' ^ SyntaxError: invalid syntax
Change the url in line 33 to https://www.humblebundle.com/ instead of https://hr-humblebundle.appspot.com/
First off: thanks, this looks great! :)
How do I authorize for downloading? When running the script, I get the following error:
(I've replaced my key with the original example key.)
Eh, just read through the description again. Seems we need to add a way to handle logins. Do you mind if I add that on, and package this a little more nicely for installation with homebrew? :)