Skip to content

Instantly share code, notes, and snippets.

@victormurcia
Created September 5, 2022 04:46
Show Gist options
  • Select an option

  • Save victormurcia/ea58bbd3321a8efb0b2dfd08b45f0a67 to your computer and use it in GitHub Desktop.

Select an option

Save victormurcia/ea58bbd3321a8efb0b2dfd08b45f0a67 to your computer and use it in GitHub Desktop.
download_all_books from search query in project gutemberg
def download_all_books(url, save_path):
# download the page that lists top books
data = download_url(url)
print(f'.downloaded {url}')
# extract all links from the page
links = get_urls_from_html(data)
print(f'.found {len(links)} links on the page')
# retrieve all unique book ids
book_ids = get_book_identifiers(links)
print(f'.found {len(book_ids)} unique book ids')
# create the save directory if needed
makedirs(save_path, exist_ok=True)
# download and save each book in turn
for book_id in book_ids:
print(book_id)
# download and save this book
result = download_book(book_id, save_path)
# report result
print(result)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment