Gerenuk

An Aesthetic Comparison of Human-Readable
Hashing Functions

The following compares the output of several creative hash functions designed for human readability.

sha1's are merely used as arbitrary, longer, distributed input values.

zacharyvoase/humanhash

input	1 word output	2 word output	3 word output

Quick List of Resources for Topological Data Analysis with Emphasis on Machine Learning

This is just a quick list of resourses on TDA that I put together for @rickasaurus after he was asking for links to papers, books, etc on Twitter and is by no means an exhaustive list.

Survey Papers

Both Carlsson's and Ghrist's survey papers offer a very good introduction to the subject

Topology and Data by Gunnar Carlsson
Barcodes: The Persistent Topology of Data by Robert Ghrist

Other Papers and Web Resources

Extracting insights from the shape of complex data using topology A good introductory paper in Nature on the Mapper algorithm.

Note: this is a summary of different git workflows putting together to a small git bible. references are in between the text

How to Branch

try to keep your hacking out of the master and create feature branches. the [feature-branch workflow][4] is a good median between noobs (i have no idea how to branch) and git veterans (let's do some rocket sience with git branches!). everybody get the idea!

Basic usage examples

General Background and Overview

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep=rep1&t

	library(tidyverse)

	# Data is downloaded from here:
	# https://www.kaggle.com/c/digit-recognizer
	kaggle_data <- read_csv("~/Downloads/train.csv")

	pixels_gathered <- kaggle_data %>%
	mutate(instance = row_number()) %>%
	gather(pixel, value, -label, -instance) %>%
	extract(pixel, "pixel", "(\\d+)", convert = TRUE)

	"""
	Multiclass SVMs (Crammer-Singer formulation).

	A pure Python re-implementation of:

	Large-scale Multiclass Support Vector Machine Training via Euclidean Projection onto the Simplex.
	Mathieu Blondel, Akinori Fujino, and Naonori Ueda.
	ICPR 2014.
	http://www.mblondel.org/publications/mblondel-icpr2014.pdf
	"""

	Wordlist ver 0.732 - EXPECT INCOMPATIBLE CHANGES;
	acrobat africa alaska albert albino album
	alcohol alex alpha amadeus amanda amazon
	america analog animal antenna antonio apollo
	april aroma artist aspirin athlete atlas
	banana bandit banjo bikini bingo bonus
	camera canada carbon casino catalog cinema
	citizen cobra comet compact complex context
	credit critic crystal culture david delta
	dialog diploma doctor domino dragon drama

	""" Non-negative matrix factorization for I divergence

	This code was implements Lee and Seung's multiplicative updates algorithm
	for NMF with I divergence cost.

	Lee D. D., Seung H. S., Learning the parts of objects by non-negative
	matrix factorization. Nature, 1999
	"""
	# Author: Olivier Mangin <[email protected]>

	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD

	"""
	Non-Negative Garotte implementation with the scikit-learn
	"""

	# Author: Alexandre Gramfort <[email protected]>
	# Jaques Grobler (__main__ script) <[email protected]>
	#
	# License: BSD Style.

	import numpy as np