Last active
September 27, 2018 01:16
-
-
Save agritheory/bcdf8395ba0bc069fb78aa6d33b4dc72 to your computer and use it in GitHub Desktop.
Tesseract + Python Stub for Python project night
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# install tesseract per instructions for your os | |
# ubuntu: sudo apt-get install tesseract-ocr | |
# sudo pip install pytesseract | |
# sudo pip install python-Levenshtein | |
# sudo pip install fuzzywuzzy | |
# load image(s) in working directory | |
import re | |
import pytesseract | |
import pillow | |
from fuzzywuzzy import fuzz # https://stackoverflow.com/a/28467760 | |
from dateutil import parser | |
text = pytesseract.image_to_string(Image.open('image.jpg')) | |
print(text) | |
# write a series of try except patterns with file manipulation until it can resolve | |
# use pillow to "enhance" | |
# regex? to find company and total # https://regexr.com | |
# | |
# search for vendor name, total and date | |
# | |
# vendor_list = map(lambda x: x["name"], frappe.get_list("Supplier")) | |
# best_match = # reduce ? lambda x: fuzz.ratio(x | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment