Skip to content

Instantly share code, notes, and snippets.

@yokawasa
Created May 15, 2017 04:05
Show Gist options
  • Save yokawasa/8e11d9793076ef3b743f928f116863c0 to your computer and use it in GitHub Desktop.
Save yokawasa/8e11d9793076ef3b743f928f116863c0 to your computer and use it in GitHub Desktop.
unicodedata.normalizeのNFKC(Normalization Form Compatibility Composition)で文字列正規化サンプル: 半角カタカナ、全角記号、濁音、特殊文字などなどを正規化
# -*- coding: utf-8 -*-
import unicodedata
"""
unicodedata.normalizeのNFKC(Normalization Form Compatibility Composition)で半角カタカナ、全角英数、濁音、特殊文字などなどを正規化
"""
data = u"㈱㍉㌶ (%&!?@#)カタカナザザザザザア"
normal = unicodedata.normalize('NFKD', data).encode('utf-8', 'ignore')
print normal
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment