Skip to content

Instantly share code, notes, and snippets.

@htv2012
Created January 29, 2020 14:23
Show Gist options
  • Save htv2012/ebad2590eaf4bfacd48e87faf43665da to your computer and use it in GitHub Desktop.
Save htv2012/ebad2590eaf4bfacd48e87faf43665da to your computer and use it in GitHub Desktop.
Strip the accents from a unicode (e.g. Vietnamese) text
def strip_accents(text):
"""
Strips the accents, replace the dd, and lower case
"""
text = text.replace("đ", "d").replace("Đ", "d")
text = unicodedata.normalize("NFD", text)
text = text.encode("ascii", "ignore")
text = text.decode("utf-8")
text = text.lower()
return text
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment