Created
January 11, 2014 02:19
-
-
Save greeness/8366171 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "metadata": { | |
| "name": "" | |
| }, | |
| "nbformat": 3, | |
| "nbformat_minor": 0, | |
| "worksheets": [ | |
| { | |
| "cells": [ | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "import pymatbridge as pymat\n", | |
| "ip = get_ipython()\n", | |
| "pymat.load_ipython_extension(ip)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [], | |
| "prompt_number": 21 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "%%matlab\n", | |
| "cd '/Users/greeness/dataset/ufldl'\n", | |
| "visibleSize = 8*8; % number of input units \n", | |
| "hiddenSize = 25; % number of hidden units \n", | |
| "\n", | |
| "% get data (64x10000 matrix)\n", | |
| "patches = sampleIMAGES;\n", | |
| "\n", | |
| "% Obtain random parameters theta\n", | |
| "theta = initializeParameters(hiddenSize, visibleSize);\n" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [], | |
| "prompt_number": 22 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Make sure the length of $\\theta$ is correct:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "matlab hiddenSize*visibleSize * 2 + hiddenSize + visibleSize == size(theta, 1)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "display_data", | |
| "text": [ | |
| "\n", | |
| "ans =\n", | |
| "\n", | |
| " 1\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 23 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Start to look at the inside of function `sparseAutoencoderCost()`:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "%%matlab\n", | |
| "W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);\n", | |
| "W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);\n", | |
| "b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);\n", | |
| "b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);\n", | |
| "size(W1), size(W2)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "display_data", | |
| "text": [ | |
| "\n", | |
| "ans =\n", | |
| "\n", | |
| " 25 64\n", | |
| "\n", | |
| "\n", | |
| "ans =\n", | |
| "\n", | |
| " 64 25\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 24 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Forward computing from input layer to hidden layer, then from hidden layer to output layer: \n", | |
| "\n", | |
| "$$\\begin{eqnarray} \n", | |
| "z^{(2)} &=& W^{(1)}x + b^{(1)} \\\\\n", | |
| "a^{(2)} &=& f(z^{(2)})\\\\\n", | |
| "z^{(3)} &=& W^{(2)}a^{(2)} + b^{(2)} \\\\\n", | |
| "h &=& a^{(3)} = f(z^{(3)})\\\\\n", | |
| "\\end{eqnarray}$$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": true, | |
| "input": [ | |
| "%%matlab\n", | |
| "x = patches;\n", | |
| "n = size(x, 2); % number of training examples\n", | |
| "\n", | |
| "% hidden layer\n", | |
| "z2 = W1 * x + repmat(b1, 1, n);\n", | |
| "a2 = sigmoid(z2); % activation values\n", | |
| "\n", | |
| "% output layer\n", | |
| "z3 = W2 * a2 + repmat(b2, 1, n);\n", | |
| "a3 = sigmoid(z3);\n", | |
| "h = a3;\n", | |
| "\n", | |
| "size(a3)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "display_data", | |
| "text": [ | |
| "\n", | |
| "ans =\n", | |
| "\n", | |
| " 64 10000\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 31 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "So the final output $h(x)$ is a 64x10000 matrix, a size identical to the input image patches. " | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "matlab size(x)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "display_data", | |
| "text": [ | |
| "\n", | |
| "ans =\n", | |
| "\n", | |
| " 64 10000\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 29 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "[Reference](http://deeplearning.stanford.edu/wiki/index.php/Backpropagation_Algorithm)\n", | |
| "\n", | |
| "**Cost Function** contains 3 terms:\n", | |
| "\n", | |
| "1. Squared error term\n", | |
| "2. Regularization term\n", | |
| "3. Sparsity penalty\n", | |
| "\n", | |
| "We will start from the squared error tem first. The cost w.r.t a single training example $(x, y)$ is:\n", | |
| "\n", | |
| "$$ J(W,b; x,y) = \\frac{1}{2}||h(x) - y||^2$$\n", | |
| "\n", | |
| "Given a set of $m$ training examples, the overall cost function is\n", | |
| "\n", | |
| "$$J(W,b) = \\frac{1}{m} \\sum_{i=1}^m J(W,b; x^{(i)}, y^{(i)})$$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Since in our auto encoder, $y^{(i)} = x^{(i)}$, $$J(W,b) = \\frac{1}{m}\\sum_{i=1}^m (h(x^{(i)}) - x^{(i)})^2$$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "%%matlab\n", | |
| "\n", | |
| "J = mean(sum((h-x).^2))" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "display_data", | |
| "text": [ | |
| "\n", | |
| "J =\n", | |
| "\n", | |
| " 1.6314\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 60 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Time to compute the gradient for $\\theta = (W_1, W_2, b_1, b_2)$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "$$\\begin{eqnarray}\n", | |
| "\\delta^{(3)} &=& - (y - a^{(3)}) \\bullet f'(z^{(3)})\\\\\n", | |
| "\\delta^{(2)} &=& \\left((W^{(2)})^T \\delta^{(3)} \\right) \\bullet f'(z^{(2)})\\\\\n", | |
| "\\Delta_{W^{(2)}} J(W,b) &=& \\delta^{(3)} (a^{(2)})^T\\\\\n", | |
| "\\Delta_{W^{(1)}}J(W,b) &=& \\delta^{(2)} (a^{(1)})^T\\\\\n", | |
| "\\Delta_{b^{(2)}}J(W,b) &=& \\delta^{(3)} \\\\\n", | |
| "\\Delta_{b^{(1)}}J(W,b) &=& \\delta^{(2)} \\\\\n", | |
| "\\end{eqnarray}$$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "For `sigmoid` function, $f'(z) = f(z) (1-f(z))$" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "%%matlab\n", | |
| "delta3 = -(x - a3) .* (a3 .* (1-a3));\n", | |
| "delta2 = (W2' * delta3) .* (a2 .* (1-a2));" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [], | |
| "prompt_number": 63 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "matlab size(delta3), size(delta2)" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "metadata": {}, | |
| "output_type": "display_data", | |
| "text": [ | |
| "\n", | |
| "ans =\n", | |
| "\n", | |
| " 64 10000\n", | |
| "\n", | |
| "\n", | |
| "ans =\n", | |
| "\n", | |
| " 25 10000\n", | |
| "\n" | |
| ] | |
| } | |
| ], | |
| "prompt_number": 64 | |
| }, | |
| { | |
| "cell_type": "code", | |
| "collapsed": false, | |
| "input": [ | |
| "%%matlab\n", | |
| "a1 = x;\n", | |
| "\n", | |
| "W2grad = delta3 * a2';\n", | |
| "W1grad = delta2 * a1';\n", | |
| "b2grad = delta3;\n", | |
| "b1grad = delta2;" | |
| ], | |
| "language": "python", | |
| "metadata": {}, | |
| "outputs": [], | |
| "prompt_number": 52 | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "Final **`sparseAutoencoderCost`** function code looks like below:\n", | |
| "```matlab\n", | |
| "function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...\n", | |
| " lambda, sparsityParam, beta, data)\n", | |
| "\n", | |
| "% visibleSize: the number of input units (probably 64) \n", | |
| "% hiddenSize: the number of hidden units (probably 25) \n", | |
| "% lambda: weight decay parameter\n", | |
| "% sparsityParam: The desired average activation for the hidden units (denoted in the lecture\n", | |
| "% notes by the greek alphabet rho, which looks like a lower-case \"p\").\n", | |
| "% beta: weight of sparsity penalty term\n", | |
| "% data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example. \n", | |
| " \n", | |
| "% The input theta is a vector (because minFunc expects the parameters to be a vector). \n", | |
| "% We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this \n", | |
| "% follows the notation convention of the lecture notes. \n", | |
| "\n", | |
| "W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);\n", | |
| "W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);\n", | |
| "b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);\n", | |
| "b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);\n", | |
| "\n", | |
| "% Cost and gradient variables (your code needs to compute these values). \n", | |
| "% Here, we initialize them to zeros. \n", | |
| "cost = 0;\n", | |
| "W1grad = zeros(size(W1)); \n", | |
| "W2grad = zeros(size(W2));\n", | |
| "b1grad = zeros(size(b1)); \n", | |
| "b2grad = zeros(size(b2));\n", | |
| "\n", | |
| "%% ---------- YOUR CODE HERE --------------------------------------\n", | |
| "% Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,\n", | |
| "% and the corresponding gradients W1grad, W2grad, b1grad, b2grad.\n", | |
| "%\n", | |
| "% W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.\n", | |
| "% Note that W1grad has the same dimensions as W1, b1grad has the same dimensions\n", | |
| "% as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with\n", | |
| "% respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b) \n", | |
| "% with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term \n", | |
| "% [(1/m) \\Delta W^{(1)} + \\lambda W^{(1)}] in the last block of pseudo-code in Section 2.2 \n", | |
| "% of the lecture notes (and similarly for W2grad, b1grad, b2grad).\n", | |
| "% \n", | |
| "% Stated differently, if we were using batch gradient descent to optimize the parameters,\n", | |
| "% the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2. \n", | |
| "% \n", | |
| "\n", | |
| "x = data;\n", | |
| "n = size(x, 2); % number of training examples\n", | |
| "\n", | |
| "% hidden layer\n", | |
| "z2 = W1 * x + repmat(b1, 1, n);\n", | |
| "a2 = sigmoid(z2); % activation values\n", | |
| "\n", | |
| "% output layer\n", | |
| "z3 = W2 * a2 + repmat(b2, 1, n);\n", | |
| "a3 = sigmoid(z3);\n", | |
| "h = a3;\n", | |
| "\n", | |
| "cost = mean(sum((h-x).^2));\n", | |
| "\n", | |
| "% gradient\n", | |
| "delta3 = -(x - a3) .* (a3 .* (1-a3));\n", | |
| "delta2 = (W2' * delta3) .* (a2 .* (1-a2));\n", | |
| "a1 = x;\n", | |
| "\n", | |
| "W2grad = delta3 * a2';\n", | |
| "W1grad = delta2 * a1';\n", | |
| "b2grad = delta3;\n", | |
| "b1grad = delta2;\n", | |
| "\n", | |
| "%-------------------------------------------------------------------\n", | |
| "% After computing the cost and gradient, we will convert the gradients back\n", | |
| "% to a vector format (suitable for minFunc). Specifically, we will unroll\n", | |
| "% your gradient matrices into a vector.\n", | |
| "\n", | |
| "grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];\n", | |
| "\n", | |
| "end\n", | |
| "\n", | |
| "%-------------------------------------------------------------------\n", | |
| "% Here's an implementation of the sigmoid function, which you may find useful\n", | |
| "% in your computation of the costs and the gradients. This inputs a (row or\n", | |
| "% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). \n", | |
| "\n", | |
| "function sigm = sigmoid(x)\n", | |
| " \n", | |
| " sigm = 1 ./ (1 + exp(-x));\n", | |
| "end\n", | |
| "\n", | |
| "\n", | |
| "```" | |
| ] | |
| } | |
| ], | |
| "metadata": {} | |
| } | |
| ] | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment