Skip to content

Instantly share code, notes, and snippets.

@luisenriquecorona
Created February 8, 2020 02:17
Show Gist options
  • Save luisenriquecorona/67dfcacdca19414ce0517d1cf3c2559b to your computer and use it in GitHub Desktop.
Save luisenriquecorona/67dfcacdca19414ce0517d1cf3c2559b to your computer and use it in GitHub Desktop.
Transform some Western typographical symbols into ASCII.
single_map = str.maketrans("""‚ƒ„†ˆ‹‘’“”•–—˜›""",
"""'f"*^<''""---~>""")
multi_map = str.maketrans({
'€': '<euro>',
'…': '...',
'Œ': 'OE',
'™': '(TM)',
'œ': 'oe',
'‰': '<per mille>',
'‡': '**',
})
multi_map.update(single_map)
def dewinize(txt):
"""Replace Win1252 symbols with ASCII chars or sequences"""
return txt.translate(multi_map)
def asciize(txt):
no_marks = shave_marks_latin(dewinize(txt))
no_marks = no_marks.replace('ß', 'ss')
return unicodedata.normalize('NFKC', no_marks)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment