petered’s gists

petered / unbiased-online-recurrent-optimization

Last active August 25, 2017 12:57

2017-08-25 Unbiased Online Recurrent Optimization

	$\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}$
	$\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}$
	$\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}$
	$\newcommand{\numel}[1]{\|#1\|}$

	$\newcommand{\pderivdim}[2]{\overset{\big[\numel {#1} \times \numel {#2} \big]}{\frac{\partial #1}{\partial #2}}}$


	$\newcommand{\pderivdimg}[4]{\overset{\big[#3 \times #4 \big]}{\frac{\partial #1}{\partial #2}}}$

petered / Distributed Parameter Tuning

Created August 29, 2017 12:26

2017-08-29 Parameter Tuning


	# Distributed Low-Bit Computation

	Suppose we're trying to communicate a scalar parameter $\theta$ from a worker $W$ to a server $S$.

	$\theta$ changes with time $t$. The worker simply communicates bits of theta asynchronously - so if it sends a bit $b\in {0, 1}$ at time $t\in \mathbb I^+$ we say that the worker communicated a message $(b, t)$. If the worker sends M messages between times $t_1$ and $t_2$, we say $N_{t_1}^{t_2} = M$

	The Server takes in these bits and uses them to build a distribution $p(\hat \theta)$ over the current value of theta.

	Can we create an encoding with the following properties?:

petered / iterated-matrix-decomposition

Last active September 27, 2017 06:51

2017-09-26 Iterated Matrix Decomposition

	$\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}$
	$\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}$
	$\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}$
	$\newcommand{\norm}[1]{\frac12\\| #1 \\|_2^2}$
	$\newcommand{argmax}[1]{\underset{#1}{\operatorname{argmax}}}$
	$\newcommand{argmin}[1]{\underset{#1}{\operatorname{argmin}}}$
	$\newcommand{blue}[1]{\color{blue}{#1}}$
	$\newcommand{red}[1]{\color{red}{#1}}$
	$\newcommand{argmax}[1]{\underset{#1}{\operatorname{argmax}}}$
	$\newcommand{argmin}[1]{\underset{#1}{\operatorname{argmin}}}$

petered / kasper-project

Last active October 12, 2017 15:37

2017-10-12 Kasper



	# 1) Simple Maximum Likelihood

	F --> X

	$$
	p(F=1 \| X=x) = \frac{p(X=x\|F=1) p(F=1)}{p(X=x)} = \frac{p(X=x\|F=1) p(F=1)}{p(X=x\|F=0)p(F=0) + p(X=x\|F=1)p(F=1)}
	$$

petered / testgist

Created October 20, 2017 08:13

Temporal Networks

	# Temporal Networks

	$\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}$

	# The idea
	Let
	$(x, y)$ be the input, target data, and
	$u_1, ... u_L$ be the pre-nonlinearity activations of a neural network, and
	$w_1, ... w_L$ be the parameters and $\cdot w (x) \triangleq x\cdot w$
	$h_l(\cdot)$ be the nonlinearity of the $l'th$ layer, and

petered / generative-models-assignment

Created October 20, 2017 14:27

2017-10-20 DL Assignment: Generative Models


	# Generative Models

	## Introduction
	Generative models are models that learn the distribution of the data.

	Suppose we have a collection of N D-Dimensional points: $\{x_1, ..., x_N\}$. Each, $x_i$ might represent a vector of pixels in an image, or the words in a sentence.

	In generative modeling, we imagine that these points are samples from a D-dimensional probability distribution. The distribution represents whatever real-world process was used to generate that data. Our objective is to learn the parameters of this distribution. This allows us to do things like

petered / dl-course-a1-math

Last active November 6, 2017 11:46

2017-11-03 Matrix Math

	$$
	\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
	\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
	\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
	\newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right\|_{#3}}
	\newcommand{\norm}[1]{\frac12\\| #1 \\|_2^2}
	\newcommand{argmax}[1]{\underset{#1}{\operatorname{argmax}}}
	\newcommand{argmin}[1]{\underset{#1}{\operatorname{argmin}}}
	\newcommand{blue}[1]{\color{blue}{#1}}
	\newcommand{red}[1]{\color{red}{#1}}

petered / kasper-em-on-graph

Created December 12, 2017 23:41

2017-12-12 Kasper EM

	$$
	\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
	\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
	\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
	\newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right\|_{#3}}
	\newcommand{\norm}[1]{\frac12\\| #1 \\|_2^2}
	\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
	\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
	\newcommand{\blue}[1]{\color{blue}{#1}}
	\newcommand{\red}[1]{\color{red}{#1}}

petered / fewfds

Created January 4, 2018 15:36

2017-11-20 Online Learning Update

	$$
	\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
	\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
	\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
	\newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right\|_{#3}}
	\newcommand{\norm}[1]{\frac12\\| #1 \\|_2^2}
	\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
	\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
	\newcommand{\blue}[1]{\color{blue}{#1}}
	\newcommand{\red}[1]{\color{red}{#1}}

petered / low-var-online-learning

Last active February 15, 2018 15:22

2018-01-17 Lower-Variance Online Gradient Estimates

	$$
	\newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}}
	\newcommand{\pderivsq}[2]{\frac{\partial^2 #1}{\partial #2^2}}
	\newcommand{\lderiv}[1]{\frac{\partial \mathcal L}{\partial #1}}
	\newcommand{\pderivgiven}[3]{\left.\frac{\partial #1}{\partial #2}\right\|_{#3}}
	\newcommand{\norm}[1]{\frac12\\| #1 \\|_2^2}
	\newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}}
	\newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}
	\newcommand{\blue}[1]{\color{blue}{#1}}
	\newcommand{\red}[1]{\color{red}{#1}}

Peter O'Connor petered