Skip to content

Instantly share code, notes, and snippets.

@cben
Last active August 29, 2015 14:23
Show Gist options
  • Save cben/7bc5cb3252a380726bd7 to your computer and use it in GitHub Desktop.
Save cben/7bc5cb3252a380726bd7 to your computer and use it in GitHub Desktop.
Convert numeric entities (`{`) to unicode
#!/usr/bin/python3
# Caution: this will convert everywhere, including inside <script>...</script> or CDATA sections.
import re, os
def numeric_entities_to_utf8(fname):
s = open(fname, encoding='utf8', newline='').read()
s = re.sub(r'&#([0-9]+);', lambda m: chr(int(m.group(1))), s)
with open(fname, 'w', encoding='utf8', newline='') as f:
f.write(s)
for path, dirs, files in os.walk('.'):
for fname in files:
if fname.endswith('.html'):
numeric_entities_to_utf8(os.path.join(path, fname))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment