Skip to content

Instantly share code, notes, and snippets.

@saghul
Created July 5, 2012 10:12
Show Gist options
  • Save saghul/3052777 to your computer and use it in GitHub Desktop.
Save saghul/3052777 to your computer and use it in GitHub Desktop.
HTML to text
# Based on http://stackoverflow.com/questions/328356/extracting-text-from-html-file-using-python
from cStringIO import StringIO
from formatter import AbstractFormatter, DumbWriter
from htmllib import HTMLParser, HTMLParseError
def html2text(data):
f = StringIO()
parser = HTMLParser(AbstractFormatter(DumbWriter(f)))
try:
parser.feed(data)
except HTMLParseError:
return ''
else:
parser.close()
return f.getvalue()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment