Created
September 5, 2022 04:46
-
-
Save victormurcia/ea58bbd3321a8efb0b2dfd08b45f0a67 to your computer and use it in GitHub Desktop.
download_all_books from search query in project gutemberg
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| def download_all_books(url, save_path): | |
| # download the page that lists top books | |
| data = download_url(url) | |
| print(f'.downloaded {url}') | |
| # extract all links from the page | |
| links = get_urls_from_html(data) | |
| print(f'.found {len(links)} links on the page') | |
| # retrieve all unique book ids | |
| book_ids = get_book_identifiers(links) | |
| print(f'.found {len(book_ids)} unique book ids') | |
| # create the save directory if needed | |
| makedirs(save_path, exist_ok=True) | |
| # download and save each book in turn | |
| for book_id in book_ids: | |
| print(book_id) | |
| # download and save this book | |
| result = download_book(book_id, save_path) | |
| # report result | |
| print(result) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment