Skip to content

Instantly share code, notes, and snippets.

@miratcan
Last active February 29, 2016 06:38
Show Gist options
  • Save miratcan/bae069753ba3a8240d50 to your computer and use it in GitHub Desktop.
Save miratcan/bae069753ba3a8240d50 to your computer and use it in GitHub Desktop.
A function to find non Turkish characters in text.
alpha = 'ABC\xc3\x87DEFG\xc4\x9eHI\xc4\xb0JKLMNO\xc3\x96PRS\xc5\x9eTU' \
'\xc3\x9cVYZabc\xc3\xa7defg\xc4\x9fh\xc4\xb1ijklmno\xc3' \
'\xb6prs\xc5\x9ftu\xc3\xbcvyz'.decode('utf-8')
def invalid_chars(text, charset=alpha):
return set(filter(lambda c: c not in charset, list(text)))
invalid_chars(u'üğüğüp0*2')
{u'*', u'0', u'2'}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment