Skip to content

Instantly share code, notes, and snippets.

@AlessandroVaccarino
Created June 13, 2020 21:37
Show Gist options
  • Save AlessandroVaccarino/7fb427c366e1bc34e4f141d4764aec72 to your computer and use it in GitHub Desktop.
Save AlessandroVaccarino/7fb427c366e1bc34e4f141d4764aec72 to your computer and use it in GitHub Desktop.
A simple script to download SlideShare slides (as images)
import requests
import scrapy
import os.path
presentationLink = '...'
presentationId = presentationLink.split("/")[-1]
userAgent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'
presentationResponse = requests.get(presentationLink, headers = {'User-agent': userAgent})
if presentationResponse.status_code != 200:
print("ERROR " + str(presentationResponse.status_code) + " while scrping " + presentationLink)
else:
presentationContet = presentationResponse.text
presentationSelector = scrapy.Selector(text=presentationContet)
slidesLinks = presentationSelector.xpath('//*[@id="svPlayerId"]/div[1]/div[2]').css('section > img').xpath('@data-full').getall()
presentationOutPath = os.path.dirname(__file__) + '/' + presentationId
if not os.path.exists(presentationOutPath):
os.makedirs(presentationOutPath)
for slideNumber,slideLink in enumerate(slidesLinks):
slideReturn = requests.get(slideLink).content
with open(presentationOutPath + '/slide' + str(slideNumber) + '.jpg', 'wb') as slideFile:
slideFile.write(slideReturn)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment