Skip to content

Instantly share code, notes, and snippets.

@acrymble
Created June 30, 2011 14:13
Show Gist options
  • Select an option

  • Save acrymble/1056315 to your computer and use it in GitHub Desktop.

Select an option

Save acrymble/1056315 to your computer and use it in GitHub Desktop.
HTML to List
#html-to-list1.py
import urllib2
import dh
url = 'http://www.oldbaileyonline.org/print.jsp?div=t17800628-33'
response = urllib2.urlopen(url)
xhtml = response.read()
text = dh.stripTags(xhtml).lower() #add the string method here.
wordlist = text.split()
print (wordlist)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment