Skip to content

Instantly share code, notes, and snippets.

@fongfan999
Last active September 25, 2019 10:28
Show Gist options
  • Save fongfan999/f602aa301ea880aa9ec41616b6d01e85 to your computer and use it in GitHub Desktop.
Save fongfan999/f602aa301ea880aa9ec41616b6d01e85 to your computer and use it in GitHub Desktop.
Ruby script to sanitize Vietnamese "marks"
dic = {
'áàảãạăắặằẳẵâấầẩẫậ': 'a',
'ÁÀẢÃẠĂẮẶẰẲẴÂẤẦẨẪẬ': 'A',
'đ': 'd',
'Đ': 'D',
'éèẻẽẹêếềểễệ': 'e',
'ÉÈẺẼẸÊẾỀỂỄỆ': 'E',
'íìỉĩị': 'i',
'ÍÌỈĨỊ': 'I',
'óòỏõọôốồổỗộơớờởỡợ': 'o',
'ÓÒỎÕỌÔỐỒỔỖỘƠỚỜỞỠỢ': 'O',
'úùủũụưứừửữự': 'u',
'ÚÙỦŨỤƯỨỪỬỮỰ': 'U',
'ýỳỷỹỵ': 'y',
'ÝỲỶỸỴ': 'Y'
}
from_string = dic.keys.join
# => áàảãạăắặằẳẵâấầẩẫậÁÀẢÃẠĂẮẶẰẲẴÂẤẦẨẪẬđĐ...ýỳỷỹỵÝỲỶỸỴ
to_string = ''
dic.each { |key, value| to_string << (value * key.length) }
# => aaaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAAAdD...yyyyyYYYYY
str = 'Xin chào, Tôi là Phan Thái Phong'
str.tr(from_string, to_string) # Xin chao, Toi la Phan Thai Phong
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment