Skip to content

Instantly share code, notes, and snippets.

@yurivictor
Last active December 11, 2015 05:48
Show Gist options
  • Save yurivictor/4554196 to your computer and use it in GitHub Desktop.
Save yurivictor/4554196 to your computer and use it in GitHub Desktop.
Get all the transcripts from West Wing episodes
import json
import requests
from pyquery import PyQuery as pq
def get_transcripts():
url = 'http://www.westwingtranscripts.com/search.php?flag=getTranscript&id='
for x in range( 1, 156 ):
payload = { 'flag': 'getTranscript', 'id': x }
request = requests.get( url, params=payload )
htmlstuffs = request.content
transcript_with_html = pq( htmlstuffs )( 'pre' ).text()
transcript_escaped = unicode( transcript_with_html )
transcript = json.dumps( [ x, { 'transcript': transcript_escaped } ], )
print transcript
get_transcripts();
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment