Skip to content

Instantly share code, notes, and snippets.

@codeb2cc
Created February 20, 2013 08:56
Show Gist options
  • Save codeb2cc/4994086 to your computer and use it in GitHub Desktop.
Save codeb2cc/4994086 to your computer and use it in GitHub Desktop.
Remove control characters
import unicodedata
import re
chars = (unichr(i) for i in xrange(0x110000))
cc = ''.join(c for c in chars if unicodedata.category(c) == 'Cc')
# or equivalently and much more efficiently
cc = ''.join(map(unichr, range(0,32) + range(127,160)))
cc_re = re.compile('[%s]' % re.escape(cc))
def remove_cc(s):
return cc_re.sub('', s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment