Skip to content

Instantly share code, notes, and snippets.

@mdespriee
Created December 23, 2015 13:15
Show Gist options
  • Save mdespriee/250bcd0559d5778d46ab to your computer and use it in GitHub Desktop.
Save mdespriee/250bcd0559d5778d46ab to your computer and use it in GitHub Desktop.
Unicode string normalization
def normalize(s):
""" Expects a unicode string, not encoded byte string.
Returns unicode string
"""
out = ''.join( c for c in unicodedata.normalize("NFKD", s)
if not unicodedata.combining(c) )
out = _regexAlpha.sub(' ', out)
out = _regexSpace.sub(' ', out)
out = out.strip().upper()
return out
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment