Jedd Haberstro jhaberstro

A Statistical View of Deep Learning
What My Deep Model Doesn't Know...
Bayesian Methods for Machine Learning
Deep Learning Summer School 2015
Bengio's Deep learning course notes
deeplearning.net tutorial
UFLDL Tutorial
Reading lists for new MILA students
[What's the best way to go about transitioning to a ML career? Is it even realistic for someone with my background?](https://www.reddit.com/r/MachineLearning/comments/3sknex/whats_the_bes

Taming the Jaguar x86 Optimization at Insomniac Games

When the branch predictor is wrong and speculatively executes code from a branch that is not taken, that can actually pollute caches, causing much worse performance than just wasted cycles from fetch, decode, ALU.
Retiring: all instructions retire (commit) in program order and happens at a max rate of 2/cycle.
- i.e. the visible side-effects of an instruction are committed in order, even if executed out of order.
L1 hit takes 3 cycle, L2 hit takes 25 cycles, i.e. L2 is ~8x slower.
Main memory ~200 cycles, i.e. around 66x slower than L1
Retire control unit (RCU) can only store 64 instructions.
L2 miss + full RCU can be a recipe for disaster:
- L2 miss will not retire for 200+ cycles and frontend is (almost) always fetching 2 instructions / cycle, which means after ~32 instructions the RCU is full and so the entire pipeline must stall. CPU can no longer (out of order) execute instructions that occur after the memory op to hide that memory

Showcases some interesting and non-obvious optimizations that compilers can make on and around atomics. In particular, I liked this example: the following code

int x = 0;
std::atomic<int> y;
int dso() {
  x = 0;
  int z = y.load(std::memory_order_seq_cst);
  y.store(0, std::memory_order_seq_cst);
  x = 1;
 return z;

	object main
	{
	val coins = Array(1, 5, 10, 25)
	val minCoin = coins.reduceLeft (_ min _)
	var table: Map[Int, Seq[Int]] = Map()
	val infinity: Seq[Int] = for (i <- 1 to 100000) yield i


	def calculateMinCoins(n: Int): Seq[Int] = {
	if (n == 0) { Seq() }

	// spinlock.h
	#include <thread>

	class Mutex
	{
	public:
	Mutex();

	Mutex(Mutex const&) = delete;

	// Example program
	#include <iostream>
	#include <string>
	#include <vector>
	#include <type_traits>

	//---------------------
	// Maybe and MyVector, two totally unrelated classes whose only commanilty is that they are both type constructors of the same arity (e.g. 1) and order (e.g. 1).
	//---------------------
	template< typename T >

	http://courses.cms.caltech.edu/cs179/
	http://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
	https://community.arm.com/graphics/b/blog
	http://cdn.imgtec.com/sdk-documentation/PowerVR+Hardware.Architecture+Overview+for+Developers.pdf
	http://cdn.imgtec.com/sdk-documentation/PowerVR+Series5.Architecture+Guide+for+Developers.pdf
	https://www.imgtec.com/blog/a-look-at-the-powervr-graphics-architecture-tile-based-rendering/
	https://www.imgtec.com/blog/the-dr-in-tbdr-deferred-rendering-in-rogue/
	http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-412605
	https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
	https://community.arm.com/graphics/b/documents/posts/moving-mobile-graphics#siggraph2015

	import numpy as np
	from scipy.stats import norm
	from math import log

	N = 1000
	true_loc = 10.0
	true_stddev = 0.1
	x_data = true_loc + (np.random.randn(N) * true_stddev)

	def lognormalpdf(x,loc,scale):

	import numpy as np
	from skimage import filters

	def optical_flow_lk(t0, t1, sigma):
	# setup the local linear systems of equations
	gradients = np.gradient(t0)
	dx, dy = gradients[1], gradients[0]
	dt = t1 - t0
	A00 = filters.gaussian(dx * dx, sigma)
	A11 = filters.gaussian(dy * dy, sigma)

	import numpy as np
	from skimage import filters
	from scipy.sparse import csc_matrix
	from scipy.sparse.linalg import spsolve

	def optical_flow_hs(t0, t1, alpha):
	h, w = t0.shape[:2]
	gradients = np.gradient(t0)
	dx, dy = gradients[1], gradients[0]
	dt = t1 - t0