Skip to content

Instantly share code, notes, and snippets.

@simonw
Created February 12, 2010 12:04
Show Gist options
  • Save simonw/302509 to your computer and use it in GitHub Desktop.
Save simonw/302509 to your computer and use it in GitHub Desktop.
import re
p = re.compile(
r'(<b>|</b>|<i>|</i>|<blockquote>|</blockquote>|<a href="[^"]+">|</a>)',
re.IGNORECASE
)
escape = lambda s: s.replace(
'&', '&amp;'
).replace(
'>', '&gt;'
).replace(
'<', '&lt;'
)
def escape_unwhitelisted_tags(html):
"""
This is NOT a complete solution for sanitising potentially malicious
HTML. You still need to ensure that the resulting tags are correctly
balanced, and it's VITAL that you add an additional step to check that
users have not used dodgy protocols like javascript: in their a tags.
"""
s = []
for token in p.split(html):
if p.match(token):
s.append(token)
else:
s.append(escape(token))
return ''.join(s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment