Skip to content

Instantly share code, notes, and snippets.

@hepplerj
Created May 13, 2016 16:12
Show Gist options
  • Select an option

  • Save hepplerj/aa14882ca12e2621f357631862dd11e6 to your computer and use it in GitHub Desktop.

Select an option

Save hepplerj/aa14882ca12e2621f357631862dd11e6 to your computer and use it in GitHub Desktop.
OCR cleanup check
#!/usr/bin/env python
import collections, pprint, re, sys
with open(sys.argv[1],'r') as file_in:
chars = re.sub('[\s\w!!"#$%&()*+,-./:;<=>?@^_`{|}~\[\]\'\\\]','', file_in.read())
counts = collections.Counter(chars)
pprint.pprint(counts.most_common())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment