They used to sell an all in one scanner product, looks like they focus more on software and cloud processing solutions: https://www.neat.com/track-receipts/
Software is able to drive certain scanners that support this API https://en.wikipedia.org/wiki/TWAIN
The shifted their focus from the individual to small business as the market was likely larger in that segment. They used to have an offline solution that wrapped up the scanner and software in one package. Now they seem to be more cloud oriented.
https://www.amazon.com/NeatReceipts-Mobile-Scanner-Digital-Filing/dp/B001CQFRPO
Lots of folks complaining about hardware and software going into legacy state and no longer supported. Folks complained about being redirected to cloud subscription based solutions, drivers removed for downloading, etc.
Interesting feature to map to to IRS Tax Categories used in Schedules A, B, C.
Microsoft flexing some prebuilt AI models shared publicly through proprietary tools.
https://docs.microsoft.com/en-us/ai-builder/prebuilt-receipt-processing
Demo video Receipt Processing using AI Builder in Power Apps
Seems to be two routes of usage, Power Apps, Power Automation
Use the receipt processing prebuilt model in Power Automate https://docs.microsoft.com/en-us/ai-builder/flow-receipt-processing this is a cloud based solution, you upload the receipt artifact and it pushes parsed results to an online Excel spreadsheet.
ScanSnap Receipts Scanning demo on youtube showing the hardware and software a bit. Walks through the parsed columns and you can see in high def video the text of the receipts scrolled in the app that it matches for the most part the parsed data. Not a perfect job but looks like a good starting point. https://www.youtube.com/watch?v=yuaToPhDT34
Official Product Page Hardware: https://www.fujitsu.com/us/products/computing/peripheral/scanners/soho/ Software: https://www.fujitsu.com/us/products/computing/peripheral/scanners/soho/sshome/ Specifications: https://www.fujitsu.com/us/products/computing/peripheral/scanners/soho/sshome/#tab-b-03 Has assignable configurations per document source to recall what fields go into what files names and other fields of data. Configurations can be recalled from the touch screen to swap to other settings, so essentially semi-automated approach.
Interesting comparison to VueScan Many people lost out on the 32bit dropped support and no longer had support for their old scanners. Folks began to look externally, and one of the classic scanning apps VueScan was evaluated by this fella. https://tidbits.com/2019/12/02/vuescan-not-the-scansnap-replacement-youre-looking-for/
Google OCR project, gone through many iterations and refinements, now uses LSTM deep learning to enhance recognition.
- https://github.com/tesseract-ocr/tesseract
- https://en.wikipedia.org/wiki/Tesseract_(software)
- https://opensource.google/projects/tesseract
Receipt applications with tesseract https://stackoverflow.com/questions/31633403/tesseract-receipt-scanning-advice-needed
11 questions tagged both receipt
and ocr
.
https://stackoverflow.com/search?q=%5Breceipt%5D%5Bocr%5D&searchOn=3
Abiity to train your own custom model, your own language. Differentiates fixed and non-fixed font widths.
Great blog post about receipt digitization workflow, processes, and background. https://nanonets.com/blog/receipt-ocr/
Mentions downloading tesseract binaries, installing pything petesseract bindings. Shows a full example that gets into text parsing using regex patterns.
Goes through various models that explore this space and then in the end redirects you to a pretrained OCR API you can purchase.
The rule-based methods rely heavily on the predefined template rules to extract information from specific invoice layouts This was an interesting bit, similar to how I parse known frequented ecommerce sites to develop those custom scrapers to extract the line item purchase data I'm after. So automate the most frequent stores and then whatever is left over is doable for manual entry.
Subscription based cloud solution it looks like: https://nanonets.com/pricing/
Hits in HackerNews https://news.ycombinator.com/item?id=21843342
- https://github.com/invoice-x/invoice2data Python Invoice Data Extractor from PDF | Invoice2data - Pdf2text Part 1 Review of the tool https://medium.com/version-1/my-experience-extracting-invoice-data-using-invoice2data-in-python-1c6450fa001f
Earlier this year I was working on hybrid PDFs[1] that embed a full XML invoice. Standardized and promoted by the German and French.[2] One more thing to hide. https://news.ycombinator.com/item?id=18383558
https://www.waveapps.com/receipts https://en.wikipedia.org/wiki/Wave_Financial
- https://www.youtube.com/watch?v=Gnt--fjvbuY https://techcrunch.com/2013/04/10/wave-accounting-free-receipt-scanning
https://www.abbyy.com/cloud-ocr-sdk/ mentioned by someone looking at their R package--https://cran.r-project.org/web/packages/abbyyR/index.html https://github.com/soodoku/abbyyR
https://www.pcmag.com/reviews/zoho-expense Zoho Expense: Overview
- https://en.wikipedia.org/wiki/Invoice_processing#Automatic_process
- https://github.com/SmartReceipts/SmartReceiptsLibrary
- https://www.xero.com/us/features-and-tools/accounting-software/expenses/receipt-scanning/
- https://www.freshbooks.com/hub/productivity/receipt-scanning-apps
- https://www.shoeboxed.com/
- https://play.google.com/store/apps/details?id=com.easyexpense&hl=en_US&gl=US
- https://github.com/MrFinchh/Receipt-OCR
- https://www.klippa.com/en/ocr/financial-documents/receipts
- https://www.blinkreceipt.com/
- https://help.rydoo.com/en/articles/4611416-single-receipt-multiple-receipts-scanning
https://news.ycombinator.com/item?id=18199708 https://pdftables.com/ https://camelot-py.readthedocs.io/en/master/ https://tomassetti.me/how-to-convert-a-pdf-to-excel/