Skip to content

Instantly share code, notes, and snippets.

@Radcliffe
Last active August 29, 2015 14:23
Show Gist options
  • Save Radcliffe/4c4cb3cab2598815aacc to your computer and use it in GitHub Desktop.
Save Radcliffe/4c4cb3cab2598815aacc to your computer and use it in GitHub Desktop.
Find the titles and running times of videos in a YouTube playlist by scraping the web page
# Python script to list the titles and running times of the videos
# in a YouTube playlist.
# Note: Only works for up to 100 videos. TODO: Fix this bug.
# The playlist id can be found in the playlist's URL.
playlist_id = "PLt5AfwLFPxWLNZRKWlcRmTABh_SExiiCj"
from lxml import html
import requests
url = "https://www.youtube.com/playlist?list=%s" % playlist_id
page = requests.get(url)
tree = html.fromstring(page.text)
body = tree.body
items = body.get_element_by_id('pl-load-more-destination')
total_seconds = 0
for item in items:
try:
running_time = item[6][0][0][0].text
mm, ss = map(int, running_time.split(':'))
total_seconds += 60*mm + ss
title = item.get('data-title')
print (("%s\t%s" % (running_time, title)))
except:
pass
mm = total_seconds // 60
ss = total_seconds % 60
print '%d:%02d\tTOTAL' % (mm, ss)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment