fabio fumarola fabiofumarola

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

Quick Tips for Fast Code on the JVM

I was talking to a coworker recently about general techniques that almost always form the core of any effort to write very fast, down-to-the-metal hot path code on the JVM, and they pointed out that there really isn't a particularly good place to go for this information. It occurred to me that, really, I had more or less picked up all of it by word of mouth and experience, and there just aren't any good reference sources on the topic. So… here's my word of mouth.

This is by no means a comprehensive gist. It's also important to understand that the techniques that I outline in here are not 100% absolute either. Performance on the JVM is an incredibly complicated subject, and while there are rules that almost always hold true, the "almost" remains very salient. Also, for many or even most applications, there will be other techniques that I'm not mentioning which will have a greater impact. JMH, Java Flight Recorder, and a good profiler are your very best friend! Mea

	/** @jsx React.DOM */

	// d3 chart function
	// note that this is a higher-order function to
	// allowing passing in the component properties/state
	update = function(props) {
	updateCircle = function(me) {
	me
	.attr("r", function(d) { return d.r; })
	.attr("cx", function(d) { return 3 + d.r; })

	import akka.http.scaladsl.model.HttpMethods._
	import akka.http.scaladsl.model.headers.{`Access-Control-Allow-Credentials`, `Access-Control-Allow-Headers`, `Access-Control-Allow-Methods`, `Access-Control-Allow-Origin`}
	import akka.http.scaladsl.server.directives.RespondWithDirectives

	trait EnableCORSDirectives extends RespondWithDirectives {

	private val allowedCorsVerbs = List(
	CONNECT, DELETE, GET, HEAD, OPTIONS,
	PATCH, POST, PUT, TRACE
	)

	import akka.actor.ActorRef;
	import akka.dispatch.*;
	import org.jctools.queues.MpscArrayQueue;

	/**
	* Non-blocking, multiple producer, single consumer high performance bounded message queue,
	* this implementation is similar but simpler than LMAX disruptor.
	*/
	public final class MpscBoundedMailbox implements MessageQueue {