-
-
Save JoeThunyathep/f050b94850f32aaf009d15440c87f5db to your computer and use it in GitHub Desktop.
import requests, wget | |
import pandas as pd | |
df = pd.read_excel("Free+English+textbooks.xlsx") | |
for index, row in df.iterrows(): | |
# loop through the excel list | |
file_name = f"{row.loc['Book Title']}_{row.loc['Edition']}".replace('/','-').replace(':','-') | |
url = f"{row.loc['OpenURL']}" | |
r = requests.get(url) | |
download_url = f"{r.url.replace('book','content/pdf')}.pdf" | |
wget.download(download_url, f"./download/{file_name}.pdf") | |
print(f"downloading {file_name}.pdf Complete ....") |
Hi HAKO441,
Please use this link: https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v5
Hi Joe, thanks for the code. I tried it and it just downloaded all the titles as 13kb pdfs that won’t open. If I download the file manually, it works. If I download it using wget in bash, it also works. Is there some explanation why it is not working with python wget.download? Thanks a lot in advance.
The small "PDF" file is actually this text:
{Skip to main content}
This service is more advanced with JavaScript available, learn more at {http://activatejavascript.org}
{[SpringerLink] }
Search SpringerLink
{Search }
- {Home }
- {Log in }
You're almost there...
Over 10 million scientific documents at your fingertips- {Home}
- {Impressum}
- {Legal information}
- {Privacy statement}
- {How we use cookies}
- {Cookie settings}
- {Accessibility}
- {Contact us}
{Springer Nature }
© 2020 Springer Nature Switzerland AG. Part of {Springer Nature}.
Not logged in Not affiliated
This is a great project.
But after running it, I get this error:
Here is the code setup in my Sublime Text 3 editor:
I have initially installed all necessary packages using pip install ...
command. Please, what could be wrong?
UPDATE
I found out that there is a /break character in the some cells in the Edition column of Free+English+textbooks.xlsx.
The error now is that its downloading all files in <100kb PDFs (which is unusual).
Please, what could be wrong this time?
Please, what could be wrong this time?
reCapthca unfortunately :(
I tried with safari and chrome but receiving the same result. I hope you can check it again when you have time.