Skip to content

Instantly share code, notes, and snippets.

@greeness
Created January 11, 2014 02:19
Show Gist options
  • Select an option

  • Save greeness/8366171 to your computer and use it in GitHub Desktop.

Select an option

Save greeness/8366171 to your computer and use it in GitHub Desktop.
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pymatbridge as pymat\n",
"ip = get_ipython()\n",
"pymat.load_ipython_extension(ip)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%matlab\n",
"cd '/Users/greeness/dataset/ufldl'\n",
"visibleSize = 8*8; % number of input units \n",
"hiddenSize = 25; % number of hidden units \n",
"\n",
"% get data (64x10000 matrix)\n",
"patches = sampleIMAGES;\n",
"\n",
"% Obtain random parameters theta\n",
"theta = initializeParameters(hiddenSize, visibleSize);\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Make sure the length of $\\theta$ is correct:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"matlab hiddenSize*visibleSize * 2 + hiddenSize + visibleSize == size(theta, 1)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"\n",
"ans =\n",
"\n",
" 1\n",
"\n"
]
}
],
"prompt_number": 23
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Start to look at the inside of function `sparseAutoencoderCost()`:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%matlab\n",
"W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);\n",
"W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);\n",
"b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);\n",
"b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);\n",
"size(W1), size(W2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"\n",
"ans =\n",
"\n",
" 25 64\n",
"\n",
"\n",
"ans =\n",
"\n",
" 64 25\n",
"\n"
]
}
],
"prompt_number": 24
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Forward computing from input layer to hidden layer, then from hidden layer to output layer: \n",
"\n",
"$$\\begin{eqnarray} \n",
"z^{(2)} &=& W^{(1)}x + b^{(1)} \\\\\n",
"a^{(2)} &=& f(z^{(2)})\\\\\n",
"z^{(3)} &=& W^{(2)}a^{(2)} + b^{(2)} \\\\\n",
"h &=& a^{(3)} = f(z^{(3)})\\\\\n",
"\\end{eqnarray}$$"
]
},
{
"cell_type": "code",
"collapsed": true,
"input": [
"%%matlab\n",
"x = patches;\n",
"n = size(x, 2); % number of training examples\n",
"\n",
"% hidden layer\n",
"z2 = W1 * x + repmat(b1, 1, n);\n",
"a2 = sigmoid(z2); % activation values\n",
"\n",
"% output layer\n",
"z3 = W2 * a2 + repmat(b2, 1, n);\n",
"a3 = sigmoid(z3);\n",
"h = a3;\n",
"\n",
"size(a3)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"\n",
"ans =\n",
"\n",
" 64 10000\n",
"\n"
]
}
],
"prompt_number": 31
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So the final output $h(x)$ is a 64x10000 matrix, a size identical to the input image patches. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"matlab size(x)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"\n",
"ans =\n",
"\n",
" 64 10000\n",
"\n"
]
}
],
"prompt_number": 29
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[Reference](http://deeplearning.stanford.edu/wiki/index.php/Backpropagation_Algorithm)\n",
"\n",
"**Cost Function** contains 3 terms:\n",
"\n",
"1. Squared error term\n",
"2. Regularization term\n",
"3. Sparsity penalty\n",
"\n",
"We will start from the squared error tem first. The cost w.r.t a single training example $(x, y)$ is:\n",
"\n",
"$$ J(W,b; x,y) = \\frac{1}{2}||h(x) - y||^2$$\n",
"\n",
"Given a set of $m$ training examples, the overall cost function is\n",
"\n",
"$$J(W,b) = \\frac{1}{m} \\sum_{i=1}^m J(W,b; x^{(i)}, y^{(i)})$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since in our auto encoder, $y^{(i)} = x^{(i)}$, $$J(W,b) = \\frac{1}{m}\\sum_{i=1}^m (h(x^{(i)}) - x^{(i)})^2$$"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%matlab\n",
"\n",
"J = mean(sum((h-x).^2))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"\n",
"J =\n",
"\n",
" 1.6314\n",
"\n"
]
}
],
"prompt_number": 60
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Time to compute the gradient for $\\theta = (W_1, W_2, b_1, b_2)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\\begin{eqnarray}\n",
"\\delta^{(3)} &=& - (y - a^{(3)}) \\bullet f'(z^{(3)})\\\\\n",
"\\delta^{(2)} &=& \\left((W^{(2)})^T \\delta^{(3)} \\right) \\bullet f'(z^{(2)})\\\\\n",
"\\Delta_{W^{(2)}} J(W,b) &=& \\delta^{(3)} (a^{(2)})^T\\\\\n",
"\\Delta_{W^{(1)}}J(W,b) &=& \\delta^{(2)} (a^{(1)})^T\\\\\n",
"\\Delta_{b^{(2)}}J(W,b) &=& \\delta^{(3)} \\\\\n",
"\\Delta_{b^{(1)}}J(W,b) &=& \\delta^{(2)} \\\\\n",
"\\end{eqnarray}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For `sigmoid` function, $f'(z) = f(z) (1-f(z))$"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%matlab\n",
"delta3 = -(x - a3) .* (a3 .* (1-a3));\n",
"delta2 = (W2' * delta3) .* (a2 .* (1-a2));"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 63
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"matlab size(delta3), size(delta2)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "display_data",
"text": [
"\n",
"ans =\n",
"\n",
" 64 10000\n",
"\n",
"\n",
"ans =\n",
"\n",
" 25 10000\n",
"\n"
]
}
],
"prompt_number": 64
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%matlab\n",
"a1 = x;\n",
"\n",
"W2grad = delta3 * a2';\n",
"W1grad = delta2 * a1';\n",
"b2grad = delta3;\n",
"b1grad = delta2;"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 52
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Final **`sparseAutoencoderCost`** function code looks like below:\n",
"```matlab\n",
"function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...\n",
" lambda, sparsityParam, beta, data)\n",
"\n",
"% visibleSize: the number of input units (probably 64) \n",
"% hiddenSize: the number of hidden units (probably 25) \n",
"% lambda: weight decay parameter\n",
"% sparsityParam: The desired average activation for the hidden units (denoted in the lecture\n",
"% notes by the greek alphabet rho, which looks like a lower-case \"p\").\n",
"% beta: weight of sparsity penalty term\n",
"% data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example. \n",
" \n",
"% The input theta is a vector (because minFunc expects the parameters to be a vector). \n",
"% We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this \n",
"% follows the notation convention of the lecture notes. \n",
"\n",
"W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);\n",
"W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);\n",
"b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);\n",
"b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);\n",
"\n",
"% Cost and gradient variables (your code needs to compute these values). \n",
"% Here, we initialize them to zeros. \n",
"cost = 0;\n",
"W1grad = zeros(size(W1)); \n",
"W2grad = zeros(size(W2));\n",
"b1grad = zeros(size(b1)); \n",
"b2grad = zeros(size(b2));\n",
"\n",
"%% ---------- YOUR CODE HERE --------------------------------------\n",
"% Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,\n",
"% and the corresponding gradients W1grad, W2grad, b1grad, b2grad.\n",
"%\n",
"% W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.\n",
"% Note that W1grad has the same dimensions as W1, b1grad has the same dimensions\n",
"% as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with\n",
"% respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b) \n",
"% with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term \n",
"% [(1/m) \\Delta W^{(1)} + \\lambda W^{(1)}] in the last block of pseudo-code in Section 2.2 \n",
"% of the lecture notes (and similarly for W2grad, b1grad, b2grad).\n",
"% \n",
"% Stated differently, if we were using batch gradient descent to optimize the parameters,\n",
"% the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2. \n",
"% \n",
"\n",
"x = data;\n",
"n = size(x, 2); % number of training examples\n",
"\n",
"% hidden layer\n",
"z2 = W1 * x + repmat(b1, 1, n);\n",
"a2 = sigmoid(z2); % activation values\n",
"\n",
"% output layer\n",
"z3 = W2 * a2 + repmat(b2, 1, n);\n",
"a3 = sigmoid(z3);\n",
"h = a3;\n",
"\n",
"cost = mean(sum((h-x).^2));\n",
"\n",
"% gradient\n",
"delta3 = -(x - a3) .* (a3 .* (1-a3));\n",
"delta2 = (W2' * delta3) .* (a2 .* (1-a2));\n",
"a1 = x;\n",
"\n",
"W2grad = delta3 * a2';\n",
"W1grad = delta2 * a1';\n",
"b2grad = delta3;\n",
"b1grad = delta2;\n",
"\n",
"%-------------------------------------------------------------------\n",
"% After computing the cost and gradient, we will convert the gradients back\n",
"% to a vector format (suitable for minFunc). Specifically, we will unroll\n",
"% your gradient matrices into a vector.\n",
"\n",
"grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];\n",
"\n",
"end\n",
"\n",
"%-------------------------------------------------------------------\n",
"% Here's an implementation of the sigmoid function, which you may find useful\n",
"% in your computation of the costs and the gradients. This inputs a (row or\n",
"% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). \n",
"\n",
"function sigm = sigmoid(x)\n",
" \n",
" sigm = 1 ./ (1 + exp(-x));\n",
"end\n",
"\n",
"\n",
"```"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment