greeness · January 11, 2014 02:19
diff --git a/Forward cost b/Forward cost
 {
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import pymatbridge as pymat\n",
      "ip = get_ipython()\n",
      "pymat.load_ipython_extension(ip)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 21
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%matlab\n",
      "cd '/Users/greeness/dataset/ufldl'\n",
      "visibleSize = 8*8;   % number of input units \n",
      "hiddenSize = 25;     % number of hidden units \n",
      "\n",
      "% get data (64x10000 matrix)\n",
      "patches = sampleIMAGES;\n",
      "\n",
      "%  Obtain random parameters theta\n",
      "theta = initializeParameters(hiddenSize, visibleSize);\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 22
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Make sure the length of $\\theta$ is correct:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "matlab hiddenSize*visibleSize * 2 + hiddenSize + visibleSize == size(theta, 1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "\n",
        "ans =\n",
        "\n",
        "     1\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Start to look at the inside of function `sparseAutoencoderCost()`:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%matlab\n",
      "W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);\n",
      "W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);\n",
      "b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);\n",
      "b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);\n",
      "size(W1), size(W2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "\n",
        "ans =\n",
        "\n",
        "    25    64\n",
        "\n",
        "\n",
        "ans =\n",
        "\n",
        "    64    25\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 24
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Forward computing from input layer to hidden layer, then from hidden layer to output layer: \n",
      "\n",
      "$$\\begin{eqnarray} \n",
      "z^{(2)} &=& W^{(1)}x + b^{(1)} \\\\\n",
      "a^{(2)} &=& f(z^{(2)})\\\\\n",
      "z^{(3)} &=& W^{(2)}a^{(2)} + b^{(2)} \\\\\n",
      "h &=& a^{(3)} = f(z^{(3)})\\\\\n",
      "\\end{eqnarray}$$"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": true,
     "input": [
      "%%matlab\n",
      "x = patches;\n",
      "n = size(x, 2); % number of training examples\n",
      "\n",
      "% hidden layer\n",
      "z2 = W1 * x + repmat(b1, 1, n);\n",
      "a2 = sigmoid(z2); % activation values\n",
      "\n",
      "% output layer\n",
      "z3 = W2 * a2 + repmat(b2, 1, n);\n",
      "a3 = sigmoid(z3);\n",
      "h = a3;\n",
      "\n",
      "size(a3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "\n",
        "ans =\n",
        "\n",
        "          64       10000\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 31
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So the final output $h(x)$ is a 64x10000 matrix, a size identical to the input image patches. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "matlab size(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "\n",
        "ans =\n",
        "\n",
        "          64       10000\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 29
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "[Reference](http://deeplearning.stanford.edu/wiki/index.php/Backpropagation_Algorithm)\n",
      "\n",
      "**Cost Function** contains 3 terms:\n",
      "\n",
      "1. Squared error term\n",
      "2. Regularization term\n",
      "3. Sparsity penalty\n",
      "\n",
      "We will start from the squared error tem first. The cost w.r.t a single training example $(x, y)$ is:\n",
      "\n",
      "$$ J(W,b; x,y) = \\frac{1}{2}||h(x) - y||^2$$\n",
      "\n",
      "Given a set of $m$ training examples, the overall cost function is\n",
      "\n",
      "$$J(W,b) = \\frac{1}{m} \\sum_{i=1}^m J(W,b; x^{(i)}, y^{(i)})$$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Since in our auto encoder, $y^{(i)} = x^{(i)}$, $$J(W,b) = \\frac{1}{m}\\sum_{i=1}^m (h(x^{(i)}) - x^{(i)})^2$$"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%matlab\n",
      "\n",
      "J = mean(sum((h-x).^2))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "\n",
        "J =\n",
        "\n",
        "    1.6314\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 60
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Time to compute the gradient for $\\theta = (W_1, W_2, b_1, b_2)$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "$$\\begin{eqnarray}\n",
      "\\delta^{(3)} &=& - (y - a^{(3)}) \\bullet f'(z^{(3)})\\\\\n",
      "\\delta^{(2)} &=& \\left((W^{(2)})^T \\delta^{(3)} \\right) \\bullet f'(z^{(2)})\\\\\n",
      "\\Delta_{W^{(2)}} J(W,b) &=&  \\delta^{(3)} (a^{(2)})^T\\\\\n",
      "\\Delta_{W^{(1)}}J(W,b) &=&  \\delta^{(2)} (a^{(1)})^T\\\\\n",
      "\\Delta_{b^{(2)}}J(W,b) &=& \\delta^{(3)} \\\\\n",
      "\\Delta_{b^{(1)}}J(W,b) &=& \\delta^{(2)} \\\\\n",
      "\\end{eqnarray}$$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For `sigmoid` function, $f'(z) = f(z) (1-f(z))$"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%matlab\n",
      "delta3 = -(x - a3) .* (a3 .* (1-a3));\n",
      "delta2 = (W2' * delta3) .* (a2 .* (1-a2));"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 63
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "matlab size(delta3), size(delta2)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "display_data",
       "text": [
        "\n",
        "ans =\n",
        "\n",
        "          64       10000\n",
        "\n",
        "\n",
        "ans =\n",
        "\n",
        "          25       10000\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 64
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%matlab\n",
      "a1 = x;\n",
      "\n",
      "W2grad = delta3 * a2';\n",
      "W1grad = delta2 * a1';\n",
      "b2grad = delta3;\n",
      "b1grad = delta2;"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 52
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Final **`sparseAutoencoderCost`** function code looks like below:\n",
      "```matlab\n",
      "function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...\n",
      "                                             lambda, sparsityParam, beta, data)\n",
      "\n",
      "% visibleSize: the number of input units (probably 64) \n",
      "% hiddenSize: the number of hidden units (probably 25) \n",
      "% lambda: weight decay parameter\n",
      "% sparsityParam: The desired average activation for the hidden units (denoted in the lecture\n",
      "%                           notes by the greek alphabet rho, which looks like a lower-case \"p\").\n",
      "% beta: weight of sparsity penalty term\n",
      "% data: Our 64x10000 matrix containing the training data.  So, data(:,i) is the i-th training example. \n",
      "  \n",
      "% The input theta is a vector (because minFunc expects the parameters to be a vector). \n",
      "% We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this \n",
      "% follows the notation convention of the lecture notes. \n",
      "\n",
      "W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);\n",
      "W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize);\n",
      "b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize);\n",
      "b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);\n",
      "\n",
      "% Cost and gradient variables (your code needs to compute these values). \n",
      "% Here, we initialize them to zeros. \n",
      "cost = 0;\n",
      "W1grad = zeros(size(W1)); \n",
      "W2grad = zeros(size(W2));\n",
      "b1grad = zeros(size(b1)); \n",
      "b2grad = zeros(size(b2));\n",
      "\n",
      "%% ---------- YOUR CODE HERE --------------------------------------\n",
      "%  Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,\n",
      "%                and the corresponding gradients W1grad, W2grad, b1grad, b2grad.\n",
      "%\n",
      "% W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.\n",
      "% Note that W1grad has the same dimensions as W1, b1grad has the same dimensions\n",
      "% as b1, etc.  Your code should set W1grad to be the partial derivative of J_sparse(W,b) with\n",
      "% respect to W1.  I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b) \n",
      "% with respect to the input parameter W1(i,j).  Thus, W1grad should be equal to the term \n",
      "% [(1/m) \\Delta W^{(1)} + \\lambda W^{(1)}] in the last block of pseudo-code in Section 2.2 \n",
      "% of the lecture notes (and similarly for W2grad, b1grad, b2grad).\n",
      "% \n",
      "% Stated differently, if we were using batch gradient descent to optimize the parameters,\n",
      "% the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2. \n",
      "% \n",
      "\n",
      "x = data;\n",
      "n = size(x, 2); % number of training examples\n",
      "\n",
      "% hidden layer\n",
      "z2 = W1 * x + repmat(b1, 1, n);\n",
      "a2 = sigmoid(z2); % activation values\n",
      "\n",
      "% output layer\n",
      "z3 = W2 * a2 + repmat(b2, 1, n);\n",
      "a3 = sigmoid(z3);\n",
      "h = a3;\n",
      "\n",
      "cost = mean(sum((h-x).^2));\n",
      "\n",
      "% gradient\n",
      "delta3 = -(x - a3) .* (a3 .* (1-a3));\n",
      "delta2 = (W2' * delta3) .* (a2 .* (1-a2));\n",
      "a1 = x;\n",
      "\n",
      "W2grad = delta3 * a2';\n",
      "W1grad = delta2 * a1';\n",
      "b2grad = delta3;\n",
      "b1grad = delta2;\n",
      "\n",
      "%-------------------------------------------------------------------\n",
      "% After computing the cost and gradient, we will convert the gradients back\n",
      "% to a vector format (suitable for minFunc).  Specifically, we will unroll\n",
      "% your gradient matrices into a vector.\n",
      "\n",
      "grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];\n",
      "\n",
      "end\n",
      "\n",
      "%-------------------------------------------------------------------\n",
      "% Here's an implementation of the sigmoid function, which you may find useful\n",
      "% in your computation of the costs and the gradients.  This inputs a (row or\n",
      "% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). \n",
      "\n",
      "function sigm = sigmoid(x)\n",
      "  \n",
      "    sigm = 1 ./ (1 + exp(-x));\n",
      "end\n",
      "\n",
      "\n",
      "```"
     ]
    }
   ],
   "metadata": {}
  }
 ]
 }
	{
	"metadata": {
	"name": ""
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"import pymatbridge as pymat\n",
	"ip = get_ipython()\n",
	"pymat.load_ipython_extension(ip)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 21
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%matlab\n",
	"cd '/Users/greeness/dataset/ufldl'\n",
	"visibleSize = 8*8; % number of input units \n",
	"hiddenSize = 25; % number of hidden units \n",
	"\n",
	"% get data (64x10000 matrix)\n",
	"patches = sampleIMAGES;\n",
	"\n",
	"% Obtain random parameters theta\n",
	"theta = initializeParameters(hiddenSize, visibleSize);\n"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 22
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Make sure the length of $\\theta$ is correct:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"matlab hiddenSizevisibleSize 2 + hiddenSize + visibleSize == size(theta, 1)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "display_data",
	"text": [
	"\n",
	"ans =\n",
	"\n",
	" 1\n",
	"\n"
	]
	}
	],
	"prompt_number": 23
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Start to look at the inside of function `sparseAutoencoderCost()`:"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%matlab\n",
	"W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);\n",
	"W2 = reshape(theta(hiddenSizevisibleSize+1:2hiddenSize*visibleSize), visibleSize, hiddenSize);\n",
	"b1 = theta(2hiddenSizevisibleSize+1:2hiddenSizevisibleSize+hiddenSize);\n",
	"b2 = theta(2hiddenSizevisibleSize+hiddenSize+1:end);\n",
	"size(W1), size(W2)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "display_data",
	"text": [
	"\n",
	"ans =\n",
	"\n",
	" 25 64\n",
	"\n",
	"\n",
	"ans =\n",
	"\n",
	" 64 25\n",
	"\n"
	]
	}
	],
	"prompt_number": 24
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Forward computing from input layer to hidden layer, then from hidden layer to output layer: \n",
	"\n",
	"$$\\begin{eqnarray} \n",
	"z^{(2)} &=& W^{(1)}x + b^{(1)} \\\\\n",
	"a^{(2)} &=& f(z^{(2)})\\\\\n",
	"z^{(3)} &=& W^{(2)}a^{(2)} + b^{(2)} \\\\\n",
	"h &=& a^{(3)} = f(z^{(3)})\\\\\n",
	"\\end{eqnarray}$$"
	]
	},
	{
	"cell_type": "code",
	"collapsed": true,
	"input": [
	"%%matlab\n",
	"x = patches;\n",
	"n = size(x, 2); % number of training examples\n",
	"\n",
	"% hidden layer\n",
	"z2 = W1 * x + repmat(b1, 1, n);\n",
	"a2 = sigmoid(z2); % activation values\n",
	"\n",
	"% output layer\n",
	"z3 = W2 * a2 + repmat(b2, 1, n);\n",
	"a3 = sigmoid(z3);\n",
	"h = a3;\n",
	"\n",
	"size(a3)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "display_data",
	"text": [
	"\n",
	"ans =\n",
	"\n",
	" 64 10000\n",
	"\n"
	]
	}
	],
	"prompt_number": 31
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"So the final output $h(x)$ is a 64x10000 matrix, a size identical to the input image patches. "
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"matlab size(x)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "display_data",
	"text": [
	"\n",
	"ans =\n",
	"\n",
	" 64 10000\n",
	"\n"
	]
	}
	],
	"prompt_number": 29
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"[Reference](http://deeplearning.stanford.edu/wiki/index.php/Backpropagation_Algorithm)\n",
	"\n",
	"Cost Function contains 3 terms:\n",
	"\n",
	"1. Squared error term\n",
	"2. Regularization term\n",
	"3. Sparsity penalty\n",
	"\n",
	"We will start from the squared error tem first. The cost w.r.t a single training example $(x, y)$ is:\n",
	"\n",
	"$$ J(W,b; x,y) = \\frac{1}{2}\|\|h(x) - y\|\|^2$$\n",
	"\n",
	"Given a set of $m$ training examples, the overall cost function is\n",
	"\n",
	"$$J(W,b) = \\frac{1}{m} \\sum_{i=1}^m J(W,b; x^{(i)}, y^{(i)})$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Since in our auto encoder, $y^{(i)} = x^{(i)}$, $$J(W,b) = \\frac{1}{m}\\sum_{i=1}^m (h(x^{(i)}) - x^{(i)})^2$$"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%matlab\n",
	"\n",
	"J = mean(sum((h-x).^2))"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "display_data",
	"text": [
	"\n",
	"J =\n",
	"\n",
	" 1.6314\n",
	"\n"
	]
	}
	],
	"prompt_number": 60
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Time to compute the gradient for $\\theta = (W_1, W_2, b_1, b_2)$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"$$\\begin{eqnarray}\n",
	"\\delta^{(3)} &=& - (y - a^{(3)}) \\bullet f'(z^{(3)})\\\\\n",
	"\\delta^{(2)} &=& \\left((W^{(2)})^T \\delta^{(3)} \\right) \\bullet f'(z^{(2)})\\\\\n",
	"\\Delta_{W^{(2)}} J(W,b) &=& \\delta^{(3)} (a^{(2)})^T\\\\\n",
	"\\Delta_{W^{(1)}}J(W,b) &=& \\delta^{(2)} (a^{(1)})^T\\\\\n",
	"\\Delta_{b^{(2)}}J(W,b) &=& \\delta^{(3)} \\\\\n",
	"\\Delta_{b^{(1)}}J(W,b) &=& \\delta^{(2)} \\\\\n",
	"\\end{eqnarray}$$"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"For `sigmoid` function, $f'(z) = f(z) (1-f(z))$"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%matlab\n",
	"delta3 = -(x - a3) .* (a3 .* (1-a3));\n",
	"delta2 = (W2' * delta3) .* (a2 .* (1-a2));"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 63
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"matlab size(delta3), size(delta2)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "display_data",
	"text": [
	"\n",
	"ans =\n",
	"\n",
	" 64 10000\n",
	"\n",
	"\n",
	"ans =\n",
	"\n",
	" 25 10000\n",
	"\n"
	]
	}
	],
	"prompt_number": 64
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%matlab\n",
	"a1 = x;\n",
	"\n",
	"W2grad = delta3 * a2';\n",
	"W1grad = delta2 * a1';\n",
	"b2grad = delta3;\n",
	"b1grad = delta2;"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 52
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Final `sparseAutoencoderCost` function code looks like below:\n",
	"```matlab\n",
	"function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...\n",
	" lambda, sparsityParam, beta, data)\n",
	"\n",
	"% visibleSize: the number of input units (probably 64) \n",
	"% hiddenSize: the number of hidden units (probably 25) \n",
	"% lambda: weight decay parameter\n",
	"% sparsityParam: The desired average activation for the hidden units (denoted in the lecture\n",
	"% notes by the greek alphabet rho, which looks like a lower-case \"p\").\n",
	"% beta: weight of sparsity penalty term\n",
	"% data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example. \n",
	" \n",
	"% The input theta is a vector (because minFunc expects the parameters to be a vector). \n",
	"% We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this \n",
	"% follows the notation convention of the lecture notes. \n",
	"\n",
	"W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize);\n",
	"W2 = reshape(theta(hiddenSizevisibleSize+1:2hiddenSize*visibleSize), visibleSize, hiddenSize);\n",
	"b1 = theta(2hiddenSizevisibleSize+1:2hiddenSizevisibleSize+hiddenSize);\n",
	"b2 = theta(2hiddenSizevisibleSize+hiddenSize+1:end);\n",
	"\n",
	"% Cost and gradient variables (your code needs to compute these values). \n",
	"% Here, we initialize them to zeros. \n",
	"cost = 0;\n",
	"W1grad = zeros(size(W1)); \n",
	"W2grad = zeros(size(W2));\n",
	"b1grad = zeros(size(b1)); \n",
	"b2grad = zeros(size(b2));\n",
	"\n",
	"%% ---------- YOUR CODE HERE --------------------------------------\n",
	"% Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder,\n",
	"% and the corresponding gradients W1grad, W2grad, b1grad, b2grad.\n",
	"%\n",
	"% W1grad, W2grad, b1grad and b2grad should be computed using backpropagation.\n",
	"% Note that W1grad has the same dimensions as W1, b1grad has the same dimensions\n",
	"% as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with\n",
	"% respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b) \n",
	"% with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term \n",
	"% [(1/m) \\Delta W^{(1)} + \\lambda W^{(1)}] in the last block of pseudo-code in Section 2.2 \n",
	"% of the lecture notes (and similarly for W2grad, b1grad, b2grad).\n",
	"% \n",
	"% Stated differently, if we were using batch gradient descent to optimize the parameters,\n",
	"% the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2. \n",
	"% \n",
	"\n",
	"x = data;\n",
	"n = size(x, 2); % number of training examples\n",
	"\n",
	"% hidden layer\n",
	"z2 = W1 * x + repmat(b1, 1, n);\n",
	"a2 = sigmoid(z2); % activation values\n",
	"\n",
	"% output layer\n",
	"z3 = W2 * a2 + repmat(b2, 1, n);\n",
	"a3 = sigmoid(z3);\n",
	"h = a3;\n",
	"\n",
	"cost = mean(sum((h-x).^2));\n",
	"\n",
	"% gradient\n",
	"delta3 = -(x - a3) .* (a3 .* (1-a3));\n",
	"delta2 = (W2' * delta3) .* (a2 .* (1-a2));\n",
	"a1 = x;\n",
	"\n",
	"W2grad = delta3 * a2';\n",
	"W1grad = delta2 * a1';\n",
	"b2grad = delta3;\n",
	"b1grad = delta2;\n",
	"\n",
	"%-------------------------------------------------------------------\n",
	"% After computing the cost and gradient, we will convert the gradients back\n",
	"% to a vector format (suitable for minFunc). Specifically, we will unroll\n",
	"% your gradient matrices into a vector.\n",
	"\n",
	"grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];\n",
	"\n",
	"end\n",
	"\n",
	"%-------------------------------------------------------------------\n",
	"% Here's an implementation of the sigmoid function, which you may find useful\n",
	"% in your computation of the costs and the gradients. This inputs a (row or\n",
	"% column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). \n",
	"\n",
	"function sigm = sigmoid(x)\n",
	" \n",
	" sigm = 1 ./ (1 + exp(-x));\n",
	"end\n",
	"\n",
	"\n",
	"```"
	]
	}
	],
	"metadata": {}
	}
	]
	}
No results found