Gurumurthi V Ramanan gurusura

Libraries for a modern geospatial workflow

Distribution

Numpy

Quick List of Resources for Topological Data Analysis with Emphasis on Machine Learning

This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.

Survey Papers

Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject

Topology and Data by Gunnar Carlsson
Barcodes: The Persistent Topology of Data by Robert Ghrist

Other Papers and Web Resources

Extracting insights from the shape of complex data using topology A good introductory paper in Nature on the Mapper algorithm.

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

	"""
	Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
	BSD License
	"""
	import numpy as np

	# data I/O
	data = open('input.txt', 'r').read() # should be simple plain text file
	chars = list(set(data))
	data_size, vocab_size = len(data), len(chars)

	# Note – this is not a bash script (some of the steps require reboot)
	# I named it .sh just so Github does correct syntax highlighting.
	#
	# This is also available as an AMI in us-east-1 (virginia): ami-cf5028a5
	#
	# The CUDA part is mostly based on this excellent blog post:
	# http://tleyden.github.io/blog/2014/10/25/cuda-6-dot-5-on-aws-gpu-instance-running-ubuntu-14-dot-04/

	# Install various packages
	sudo apt-get update

	# Author: Kyle Kastner
	# License: BSD 3-Clause
	# Implementing http://mnemstudio.org/path-finding-q-learning-tutorial.htm
	# Q-learning formula from http://sarvagyavaish.github.io/FlappyBirdRL/
	# Visualization based on code from Gael Varoquaux [email protected]
	# http://scikit-learn.org/stable/auto_examples/applications/plot_stock_market.html

	import numpy as np
	import matplotlib.pyplot as plt
	from matplotlib.collections import LineCollection

	// This script is designed to run on a 1 hour trigger in Google Apps Script. It is also written to "WRITE_TRUNCATE" your table
	// which means it deletes the table and updates it with the newest information. You can change the variables in campaignList
	// if you want to adjust it for your needs.

	function chimpyAPI30days() {
	projectId = "xxx";
	datasetId = "xxx";
	tableId = 'xxx';
	yesterday = new Date();
	yesterday.setDate(yesterday.getDate() - 29);

	""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
	import numpy as np
	import cPickle as pickle
	import gym

	# hyperparameters
	H = 200 # number of hidden layer neurons
	batch_size = 10 # every how many episodes to do a param update?
	learning_rate = 1e-4
	gamma = 0.99 # discount factor for reward

	'''This script goes along the blog post
	"Building powerful image classification models using very little data"
	from blog.keras.io.
	It uses data that can be downloaded at:
	https://www.kaggle.com/c/dogs-vs-cats/data
	In our setup, we:
	- created a data/ folder
	- created train/ and validation/ subfolders inside data/
	- created cats/ and dogs/ subfolders inside train/ and validation/
	- put the cat pictures index 0-999 in data/train/cats

	import numpy as np
	from keras.models import Sequential
	from keras.layers.core import Activation, Dense

	training_data = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")
	target_data = np.array([[0],[1],[1],[0]], "float32")

	model = Sequential()
	model.add(Dense(32, input_dim=2, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))