Skip to content

Instantly share code, notes, and snippets.

@h2rd
Created December 4, 2012 14:29
Show Gist options
  • Select an option

  • Save h2rd/4204482 to your computer and use it in GitHub Desktop.

Select an option

Save h2rd/4204482 to your computer and use it in GitHub Desktop.
Strip html, xml and keep only text
try:
import lxml.html
def striphtml(data):
t = lxml.html.fromstring(data)
return t.text_content()
except:
import re
def striphtml(data):
t = re.compile(r'<.*?>')
return t.sub('', data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment