Skip to content

Instantly share code, notes, and snippets.

@james-prickett
Created March 6, 2011 18:12
Show Gist options
  • Select an option

  • Save james-prickett/857474 to your computer and use it in GitHub Desktop.

Select an option

Save james-prickett/857474 to your computer and use it in GitHub Desktop.
A simple screen scrapper to pull IMDB ID's
from lxml.html import parse
class ImdbService(object):
def get_ids_from_page(self, url):
ids = []
links = parse(url).xpath('//a/@href[contains(.,"/title/")]')
for link in links:
ids.append(link[len('/title/'):len(link) - 1])
return ids
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment