Oliver oliver006

General Background and Overview

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t

Keys for obtaining US Driver's license data

Standard for US Driver's Licenses defines 9 different barcode standards (AAMVA versions) with over 80 different fields encoded inside a barcode. Some fields exist on all barcode standards, other exist only on some. To standardize the API, we have structured the fields in the following sections:

Determining AAMVA version
Keys existing on all barcode versions
- Mandatory values
  - Personal data
  - License data
- Optional values

WebSockets + Reflux + React

Using WebSockets, React and Reflux together can be a beautiful thing, but the intial setup can be a bit of a pain. The below examples attempt to offer one (arguably enjoyable) way to use these tools together.

Overview

This trifect works well if you think of things like so:

Reflux Store: The store fetches, updates and persists data. A store can be a list of items or a single item. Most of the times you reach for this.state in react should instead live within stores. Stores can listen to other stores as well as to events being fired.
Reflux Actions: Actions are triggered by components when the component wants to change the state of the store. A store listens to actions and can listen to more than one set of actions.

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

Things I believe

This is a collection of the things I believe about software development. I have worked for years building backend and data processing systems, so read the below within that context.

Agree? Disagree? Feel free to let me know at @JanStette.

Fundamentals

Keep it simple, stupid. You ain't gonna need it.

	#cloud-config

	coreos:
	etcd:
	# generate a new token for each unique cluster from https://discovery.etcd.io/new
	discovery: https://discovery.etcd.io/<token>
	# multi-region deployments, multi-cloud deployments, and droplets without
	# private networking need to use $public_ipv4
	addr: $private_ipv4:4001
	peer-addr: $private_ipv4:7001

	#!/usr/bin/env python

	import random
	import struct
	import sys

	# Most of the Fat32 class was cribbed from https://gist.github.com/jonte/4577833

	def ppNum(num):
	return "%s (%s)" % (hex(num), num)

	package main

	import (
	"net/http"
	"compress/gzip"
	"io/ioutil"
	"strings"
	"sync"
	"io"
	)

	default['sshd']['sshd_config']['AuthenticationMethods'] = 'publickey,keyboard-interactive:pam'
	default['sshd']['sshd_config']['ChallengeResponseAuthentication'] = 'yes'
	default['sshd']['sshd_config']['PasswordAuthentication'] = 'no'