Created
May 15, 2017 04:05
-
-
Save yokawasa/8e11d9793076ef3b743f928f116863c0 to your computer and use it in GitHub Desktop.
unicodedata.normalizeのNFKC(Normalization Form Compatibility Composition)で文字列正規化サンプル: 半角カタカナ、全角記号、濁音、特殊文字などなどを正規化
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# -*- coding: utf-8 -*- | |
import unicodedata | |
""" | |
unicodedata.normalizeのNFKC(Normalization Form Compatibility Composition)で半角カタカナ、全角英数、濁音、特殊文字などなどを正規化 | |
""" | |
data = u"㈱㍉㌶ (%&!?@#)カタカナザザザザザア" | |
normal = unicodedata.normalize('NFKD', data).encode('utf-8', 'ignore') | |
print normal |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment