Last active
August 2, 2020 04:29
-
-
Save JoeThunyathep/f050b94850f32aaf009d15440c87f5db to your computer and use it in GitHub Desktop.
Python Script to Download Springer Textbooks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests, wget | |
import pandas as pd | |
df = pd.read_excel("Free+English+textbooks.xlsx") | |
for index, row in df.iterrows(): | |
# loop through the excel list | |
file_name = f"{row.loc['Book Title']}_{row.loc['Edition']}".replace('/','-').replace(':','-') | |
url = f"{row.loc['OpenURL']}" | |
r = requests.get(url) | |
download_url = f"{r.url.replace('book','content/pdf')}.pdf" | |
wget.download(download_url, f"./download/{file_name}.pdf") | |
print(f"downloading {file_name}.pdf Complete ....") |
This is a great project.
But after running it, I get this error:
Here is the code setup in my Sublime Text 3 editor:
I have initially installed all necessary packages using pip install ...
command. Please, what could be wrong?
UPDATE
I found out that there is a /break character in the some cells in the Edition column of Free+English+textbooks.xlsx.
The error now is that its downloading all files in <100kb PDFs (which is unusual).
Please, what could be wrong this time?
Please, what could be wrong this time?
reCapthca unfortunately :(
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Joe, thanks for the code. I tried it and it just downloaded all the titles as 13kb pdfs that won’t open. If I download the file manually, it works. If I download it using wget in bash, it also works. Is there some explanation why it is not working with python wget.download? Thanks a lot in advance.
The small "PDF" file is actually this text: