Skip to content

Instantly share code, notes, and snippets.

@madhead
Created April 30, 2020 20:33
Show Gist options
  • Save madhead/377781c50b770af562045c7f23952acd to your computer and use it in GitHub Desktop.
Save madhead/377781c50b770af562045c7f23952acd to your computer and use it in GitHub Desktop.
springer.py
import sys
import csv
import requests
from pathlib import Path
if __name__ == '__main__':
with open(sys.argv[1]) as csv_file:
csv = csv.DictReader(csv_file, delimiter=",", quoting=csv.QUOTE_ALL)
next(csv)
for row in csv:
print(f'Processing {row["Item Title"]} by {row["Authors"]}')
try:
print("Downloading PDF")
url = f'https://link.springer.com/content/pdf/{row["Item DOI"]}.pdf'
content = requests.get(url)
if len(content.content) < 500:
raise Exception("Not a PDF")
open(f'{Path(sys.argv[1]).parent}/{row["Item Title"]} - {row["Authors"]}.pdf', 'wb').write(content.content)
except:
print(f'Failed to download PDF for {row["Item Title"]} by {row["Authors"]}')
try:
print("Downloading EPUB")
url = f'https://link.springer.com/download/epub/{row["Item DOI"]}.epub'
content = requests.get(url)
if len(content.content) < 500:
raise Exception("Not a EPUB")
open(f'{Path(sys.argv[1]).parent}/{row["Item Title"]} - {row["Authors"]}.epub', 'wb').write(content.content)
except:
print(f'Failed to download EPUB for {row["Item Title"]} by {row["Authors"]}')
@CrazyCoder
Copy link

CrazyCoder commented May 1, 2020

Improved version: https://gist.github.com/CrazyCoder/2e2788c1542b93869c8b31948cde198a

  • Can continue download (skips existing files)
  • Removes special characters from the file names so that the file path is always valid (paths with : and other special chars will fail on Windows)
  • Refactored code a bit to remove duplication for PDF/EPUB
  • Proper file name encoding

@madhead
Copy link
Author

madhead commented May 1, 2020

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment