Skip to content

Instantly share code, notes, and snippets.

@mkelley33
Forked from braveulysses/strip_tags.py
Created April 1, 2012 15:00
Show Gist options
  • Save mkelley33/2275962 to your computer and use it in GitHub Desktop.
Save mkelley33/2275962 to your computer and use it in GitHub Desktop.
Strip HTML tags using BeautifulSoup
def strip(untrusted_html):
"""Strips out all tags from untrusted_html, leaving only text.
Converts XML entities to Unicode characters. This is desirable because it
reduces the likelihood that a filter further down the text processing chain
will double-encode the XML entities."""
soup = BeautifulStoneSoup(untrusted_html, convertEntities=BeautifulStoneSoup.ALL_ENTITIES)
safe_html = ''.join(soup.findAll(text=True))
return safe_html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment