Skip to content

Instantly share code, notes, and snippets.

@acrymble
Created June 30, 2011 13:22
Show Gist options
  • Select an option

  • Save acrymble/1056219 to your computer and use it in GitHub Desktop.

Select an option

Save acrymble/1056219 to your computer and use it in GitHub Desktop.
HTML to list 1
#html-to-list1.py
import urllib2, obo
url = 'http://www.oldbaileyonline.org/print.jsp?div=t17800628-33'
response = urllib2.urlopen(url)
html = response.read()
text = obo.stripTags(html)
wordlist = text.split()
print wordlist[0:120]
@nelsonviiera
Copy link

I found in a lesson from programming historian, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment