Last active
August 2, 2020 04:29
-
-
Save JoeThunyathep/f050b94850f32aaf009d15440c87f5db to your computer and use it in GitHub Desktop.
Python Script to Download Springer Textbooks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import requests, wget | |
import pandas as pd | |
df = pd.read_excel("Free+English+textbooks.xlsx") | |
for index, row in df.iterrows(): | |
# loop through the excel list | |
file_name = f"{row.loc['Book Title']}_{row.loc['Edition']}".replace('/','-').replace(':','-') | |
url = f"{row.loc['OpenURL']}" | |
r = requests.get(url) | |
download_url = f"{r.url.replace('book','content/pdf')}.pdf" | |
wget.download(download_url, f"./download/{file_name}.pdf") | |
print(f"downloading {file_name}.pdf Complete ....") |
Please, what could be wrong this time?
reCapthca unfortunately :(
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a great project.
But after running it, I get this error:
Here is the code setup in my Sublime Text 3 editor:
I have initially installed all necessary packages using
pip install ...
command. Please, what could be wrong?UPDATE
I found out that there is a /break character in the some cells in the Edition column of Free+English+textbooks.xlsx.
The error now is that its downloading all files in <100kb PDFs (which is unusual).
Please, what could be wrong this time?