Skip to content

Instantly share code, notes, and snippets.

@lxneng
Forked from simonw/escape_unwhitelisted_tags.py
Created February 25, 2010 06:11
Show Gist options
  • Save lxneng/314284 to your computer and use it in GitHub Desktop.
Save lxneng/314284 to your computer and use it in GitHub Desktop.
import re
p = re.compile(
r'(<b>|</b>|<i>|</i>|<blockquote>|</blockquote>|<a href="[^"]+">|</a>)',
re.IGNORECASE
)
escape = lambda s: s.replace(
'&', '&amp;'
).replace(
'>', '&gt;'
).replace(
'<', '&lt;'
)
def escape_unwhitelisted_tags(html):
"""
This is NOT a complete solution for sanitising potentially malicious
HTML. You still need to ensure that the resulting tags are correctly
balanced, and it's VITAL that you add an additional step to check that
users have not used dodgy protocols like javascript: in their a tags.
"""
s = []
for token in p.split(html):
if p.match(token):
s.append(token)
else:
s.append(escape(token))
return ''.join(s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment