Skip to content

Instantly share code, notes, and snippets.

@agritheory
Last active September 27, 2018 01:16
Show Gist options
  • Save agritheory/bcdf8395ba0bc069fb78aa6d33b4dc72 to your computer and use it in GitHub Desktop.
Save agritheory/bcdf8395ba0bc069fb78aa6d33b4dc72 to your computer and use it in GitHub Desktop.
Tesseract + Python Stub for Python project night
# install tesseract per instructions for your os
# ubuntu: sudo apt-get install tesseract-ocr
# sudo pip install pytesseract
# sudo pip install python-Levenshtein
# sudo pip install fuzzywuzzy
# load image(s) in working directory
import re
import pytesseract
import pillow
from fuzzywuzzy import fuzz # https://stackoverflow.com/a/28467760
from dateutil import parser
text = pytesseract.image_to_string(Image.open('image.jpg'))
print(text)
# write a series of try except patterns with file manipulation until it can resolve
# use pillow to "enhance"
# regex? to find company and total # https://regexr.com
#
# search for vendor name, total and date
#
# vendor_list = map(lambda x: x["name"], frappe.get_list("Supplier"))
# best_match = # reduce ? lambda x: fuzz.ratio(x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment