Skip to content

Instantly share code, notes, and snippets.

View shofetim's full-sized avatar

Jordan Schatz shofetim

View GitHub Profile
@shofetim
shofetim / README.md
Created March 24, 2016 23:06 — forked from dannguyen/README.md
Using Google Cloud Vision API to OCR scanned documents to extract structured data

Using Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

(ns fb.admin
(:require-macros [cljs.core.async.macros :refer [go]])
(:require [om.core :as om :include-macros true]
[om-tools.dom :as dom :include-macros true]))
(defn play []
(print "play clicked"))
(defn pause []
(print "pause clicked"))

Events: FirstSeen LastSeen Count From SubobjectPath Reason Message ───────── ──────── ───── ──── ───────────── ────── ─────── 7m 7m 1 {scheduler } Scheduled Successfully assigned bobosales-rc-3aos8 to ip-172-20-0-87.us-west-2.compute.internal 7m 1s 46 {kubelet ip-172-20-0-87.us-west-2.compute.internal} FailedSync Error syncing pod, skipping: mkdir /mnt/ephemeral/kubernetes/kubelet/pods/1c516e1e-e7c7-11e5-a56a-02e4b8c2a487: read-only file system

Code Samples

The samples are lifted from various production systems, and as such are not "polished".

Something in Clojure, this was part of a web service that detected if the data it was sent was encrypted or not.

(ns store.chisq
# -*- coding: utf-8 -*-
import datetime
from south.db import db
from south.v2 import SchemaMigration
from django.db import models
class Migration(SchemaMigration):
def forwards(self, orm):

Automation Checklist

Physical System

  • Working air supply?
  • Lines vented, checked for leaks.
  • Bins, belts, and area around sortation system clear of debris.
  • All conveyors can move?
  • Power to accumulation & reject tables
import datetime
from time import sleep
class A(object):
@property
def b(self):
return datetime.datetime.now()
a = A()
print a.b
from products.models import Product
from reports.models import (OutOfStockReport, ProjectedStockTransfers,
ProjectedStockTransferScores, LandedCostReport)
from datetime import date, timedelta
now = date.today()
plus_1_month = now + timedelta(days=30)
plus_2_month = now + timedelta(days=60)
plus_3_month = now + timedelta(days=90)
LandedCostReport.objects.all().delete()
(ns nmk.account
(:require-macros [cljs.core.async.macros :refer [go]])
(:require [om.core :as om :include-macros true]
[cljs.core.async :refer [put! chan <! >!]]
[om-tools.dom :as dom :include-macros true]
[om-tools.core :refer-macros [defcomponent]]
[om-bootstrap.table :refer [table]]
[om-bootstrap.random :as r]
[nmk.cart :as cart]
[goog.dom.xml :as xml]))
(mapv {:6001 "The Structure and Interpretation of Computer Programs"
:6946 "The Structure and Interpretation of Classical Mechanics"
:1806 "Linear Algebra"}
[:6001 :6946])
;; => ["The Structure and Interpretation of Computer Programs"
;; "The Structure and Interpretation of Classical Mechanics"]