bartvm · February 11, 2015 13:51
diff --git a/blocks.ipynb b/blocks.ipynb
 {
 "metadata": {
  "name": "",
  "signature": "sha256:cc9c83261b0099cc7145214f2ed4f1fb2974488460837623c10400673d30d8f5"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---\n",
      "# Bricks\n",
      "---"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Bricks are *parameterized Theano ops*. They act on Theano variables and output Theano variables, but they own and manage parameters."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blocks.bricks import Linear\n",
      "from blocks.initialization import IsotropicGaussian\n",
      "from theano import tensor\n",
      "\n",
      "x = tensor.matrix('features')\n",
      "\n",
      "linear = Linear(input_dim=100, output_dim=50,\n",
      "                weights_init=IsotropicGaussian(0.01),\n",
      "                use_bias=False)\n",
      "y = linear.apply(x)\n",
      "\n",
      "print \"The inputs are '{}', outputs '{}'\".format(linear.apply.inputs,\n",
      "                                                 linear.apply.outputs)\n",
      "print \"The name of the output variable is '{}'\".format(y)\n",
      "print \"Linear has the parameters '{}'\".format(linear.params)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "The inputs are '['input_']', outputs '['output']'\n",
        "The name of the output variable is 'linear_apply_output'\n",
        "Linear has the parameters '[W]'\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Bricks are lazy"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Bricks can be constructed lazily. This means that a brick can be partially configured."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linear2 = Linear(output_dim=30, name='linear2')\n",
      "print \"What is linear2's input dimension?\", linear2.get_dim('input_')\n",
      "print \"Does linear2 have parameters?\", hasattr(linear2, 'params')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "What is linear2's input dimension? None\n",
        "Does linear2 have parameters? False\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "try:\n",
      "    y = linear2.apply(x)\n",
      "except Exception as e:\n",
      "    print \"Tried to initialize, but failed: \\n\"\n",
      "    print e"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Tried to initialize, but failed: \n",
        "\n",
        "Lazy initialization is enabled, so please make sure you have set all the required configuration for this method call.\n",
        "\n",
        "Original exception:\n",
        "\tTypeError: an integer is required\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linear2.input_dim = linear.get_dim('output')\n",
      "linear2.apply(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "linear2_apply_output"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "The life-cycle of a brick"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Bricks go through a series of steps:\n",
      "\n",
      "1. Lazy construction\n",
      "2. Push allocation configuration to children\n",
      "3. Allocate parameters\n",
      "4. Push initialization configuration to children\n",
      "5. Initialize\n",
      "\n",
      "We constructed `linear` and when we applied it, its parameters were allocated, but not initialized yet."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linear.params[0].get_value()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],\n",
        "       [ 0.,  0.,  0., ...,  0.,  0.,  0.],\n",
        "       [ 0.,  0.,  0., ...,  0.,  0.,  0.],\n",
        "       ..., \n",
        "       [ 0.,  0.,  0., ...,  0.,  0.,  0.],\n",
        "       [ 0.,  0.,  0., ...,  0.,  0.,  0.],\n",
        "       [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32)"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linear.initialize()\n",
      "linear.params[0].get_value()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "array([[ 0.00764556, -0.01124291, -0.00137316, ..., -0.01263869,\n",
        "         0.00134491,  0.01582105],\n",
        "       [ 0.01048289,  0.00092199,  0.01072868, ...,  0.00715281,\n",
        "        -0.00843766,  0.00726915],\n",
        "       [-0.00300742, -0.00752019,  0.00470734, ...,  0.00323462,\n",
        "         0.00965908,  0.00906018],\n",
        "       ..., \n",
        "       [ 0.00111521, -0.01245422,  0.00787796, ..., -0.01080949,\n",
        "         0.01261017,  0.01154481],\n",
        "       [ 0.01984384,  0.01118707, -0.00434839, ...,  0.00457807,\n",
        "        -0.0063393 ,  0.00124276],\n",
        "       [-0.01965484,  0.00333865, -0.00197775, ..., -0.00581992,\n",
        "         0.0002461 ,  0.00671258]], dtype=float32)"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Brick hierarchy"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Bricks can have children. They can automatically be constructed (as the linear transformations here) or given as arguments (as the activations here). A parent can configure its children e.g. set their in- and output dimensions and their initialization schemes."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blocks.bricks import MLP, Rectifier\n",
      "from blocks.initialization import Constant\n",
      "mlp = MLP(dims=[64, 32, 16], activations=[Rectifier(), Rectifier()],\n",
      "          weights_init=IsotropicGaussian(), biases_init=Constant(0))\n",
      "\n",
      "mlp.children"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "[<blocks.bricks.Linear object at 0x7f844d358dd0: name=linear_0>,\n",
        " <blocks.bricks.Rectifier object at 0x7f844d358690: name=rectifier>,\n",
        " <blocks.bricks.Linear object at 0x7f844d358c90: name=linear_1>,\n",
        " <blocks.bricks.Rectifier object at 0x7f844d358610: name=rectifier>]"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The children have been constructed, but the configuration for their allocation hasn't been \"pushed\" yet. This happens automatically on `apply`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print \"Input dimension:\", mlp.children[0].get_dim('input_')\n",
      "y = mlp.apply(x)\n",
      "print \"Input dimension:\", mlp.children[0].get_dim('input_')\n",
      "\n",
      "print \"Initialization scheme:\", mlp.children[0].weights_init\n",
      "mlp.initialize()\n",
      "print \"Initialization scheme:\", mlp.children[0].weights_init.__class__.__name__"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Input dimension: None\n",
        "Input dimension: 64\n",
        "Initialization scheme: None\n",
        "Initialization scheme: IsotropicGaussian\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can intervene between the pushing of the configuration, and the allocation/initialization."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "mlp = MLP(dims=[64, 32, 16], activations=[Rectifier(), Rectifier()],\n",
      "          weights_init=IsotropicGaussian(), biases_init=Constant(0))\n",
      "y = mlp.apply(x)\n",
      "mlp.push_initialization_config()\n",
      "mlp.children[0].weights_init = IsotropicGaussian(10.)\n",
      "mlp.initialize()\n",
      "print \"Layer 1 stdev:\", mlp.children[0].params[0].get_value().std()\n",
      "print \"Layer 2 stdev:\", mlp.children[2].params[0].get_value().std()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Layer 1 stdev: 9.98742\n",
        "Layer 2 stdev: 1.0207\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "The anatomy of a brick"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A brick has \"applications\", like `Linear.apply`. In most cases, the primary application is called `apply`. Some bricks have more than one application.\n",
      "\n",
      "Applications can have class attributes, instance attributes, and properties, just like normal classes."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Linear.apply.outputs"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 10,
       "text": [
        "['output']"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linear2.apply.outputs = ['another_output']\n",
      "linear2.apply(x, return_dict=True)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "OrderedDict([('another_output', linear2_apply_another_output)])"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linear.apply(x, return_dict=True)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "OrderedDict([('output', linear_apply_output)])"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Blocks is just Theano"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "y = linear.apply(x)\n",
      "z = tensor.sqrt(y)\n",
      "w = linear2.apply(z)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 13
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Bricks to build with"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "There is a collection of basic bricks to build feedforward networks and covolutional networks\\*. as well as a variety of RNN bricks (GRU, LSTM\\*, sequence generators, attention mechanism).\n",
      "\n",
      "**Almost!*"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---\n",
      "# Graph annotation\n",
      "---"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Bricks annotate the Theano graph by attaching helpful information to the `tag` attribute of variables. This annotation happens on two levels:\n",
      "\n",
      "1. The brick itself can carry annotations.\n",
      "2. Each time an application is applied (\"application call\") adds annotations too.\n",
      "\n",
      "One type of annotation is that we tag variables with the roles they play in the graph."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "y = linear.apply(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "y.tag.roles"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "[OUTPUT]"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linear.params[0].tag.roles"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 16,
       "text": [
        "[WEIGHTS]"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We also add \"auxiliary variables\", which are variables that we might want to monitor, use as a regularizer, or otherwise might be interesting."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "linear.auxiliary_variables"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "[W_norm]"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Computation graph interface and variable filtering"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Blocks provides a convenience class, `ComputationGraph`, which walks the Theano graph for you and collects all of these annotations."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blocks.graph import ComputationGraph\n",
      "y = mlp.apply(x)\n",
      "cg = ComputationGraph(y)\n",
      "cg.auxiliary_variables"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 18,
       "text": [
        "[W_norm, b_norm, W_norm, b_norm]"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "For large models we want an easy way to retrieve the variables we are interested in to monitor, train, etc. For this we can filter all of the variables in the graph by the roles as well as by the brick that created them."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blocks.filter import VariableFilter\n",
      "from blocks.roles import WEIGHTS\n",
      "VariableFilter(roles=[WEIGHTS])(cg.variables)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 19,
       "text": [
        "[W, W]"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "VariableFilter(roles=[WEIGHTS], bricks=[mlp.children[2]])(cg.variables)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 20,
       "text": [
        "[W]"
       ]
      }
     ],
     "prompt_number": 20
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---\n",
      "# Datasets\n",
      "---"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Blocks has a high level of abstraction if it comes to dealing with data. It takes some getting used to, but it's very powerful!\n",
      "\n",
      "\"Datasets\" are *stateless* classes that define an interface to the data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "with open('corpus.txt', 'w') as f:\n",
      "    for line in range(5):\n",
      "        f.write(\"This is line {}\\n\".format(line + 1))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 21
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blocks.datasets import Dataset\n",
      "\n",
      "class TutorialDataset(Dataset):\n",
      "    def __init__(self, path):\n",
      "        self.path = path\n",
      "        \n",
      "    def open(self):\n",
      "        return open(self.path)\n",
      "        \n",
      "    def get_data(self, state=None, request=None):\n",
      "        line = state.readline()\n",
      "        if not line:\n",
      "            raise StopIteration\n",
      "        return line"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 22
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To iterate over the data we initialize a *data stream*, which manages the state of the iteration. We can ask the data stream for a single iteration (epoch) over the data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blocks.datasets.streams import DataStream\n",
      "from itertools import islice\n",
      "\n",
      "dataset = TutorialDataset('corpus.txt')\n",
      "stream = DataStream(dataset)\n",
      "epoch = stream.get_epoch_iterator()\n",
      "for batch in islice(epoch, 3):\n",
      "    print batch,"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "This is line 1\n",
        "This is line 2\n",
        "This is line 3\n"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We're in the middle of our file, but now we want to start a validation run, so we create a new data stream with a new iterator."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "another_stream = DataStream(dataset)\n",
      "another_epoch = another_stream.get_epoch_iterator()\n",
      "for batch in another_epoch:\n",
      "    print batch,"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "This is line 1\n",
        "This is line 2\n",
        "This is line 3\n",
        "This is line 4\n",
        "This is line 5\n"
       ]
      }
     ],
     "prompt_number": 24
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The states don't interfere, so we can continue where we left of with the old one."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for batch in epoch:\n",
      "    print batch,"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "This is line 4\n",
        "This is line 5\n"
       ]
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Iteration scheme"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The dataset we implemented only returned one sample at a time. But sometimes we want to iterate in more complicated patterns. These datasets support \"requests\" for data. These requests are generated by a \"request iterator\"."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blocks.datasets.schemes import ShuffledScheme\n",
      "\n",
      "shuffled_scheme = ShuffledScheme(num_examples=6, batch_size=2)\n",
      "request_iterator = shuffled_scheme.get_request_iterator()\n",
      "for request in request_iterator:\n",
      "    print request"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[2, 1]\n",
        "[4, 0]\n",
        "[3, 5]\n"
       ]
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "class RandomDataset(Dataset):\n",
      "    def __init__(self):\n",
      "        self.data = numpy.arange(12).reshape(6, 2)\n",
      "        \n",
      "    def get_data(self, state=None, request=None):\n",
      "        return self.data[request]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 27
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dataset = RandomDataset()\n",
      "stream = DataStream(dataset, iteration_scheme=shuffled_scheme)\n",
      "epoch = stream.get_epoch_iterator()\n",
      "for batch in epoch:\n",
      "    print batch"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[[8 9]\n",
        " [4 5]]\n",
        "[[ 0  1]\n",
        " [10 11]]\n",
        "[[6 7]\n",
        " [2 3]]\n"
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Chaining wrapper data streams"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Data stream wrappers take a data stream or another data stream wrapper as an input. They're just iterators that retrieve data from the iterator that they wrap, process it, and return it.\n",
      "\n",
      "A machine translation dataset pipeline would look something like this: datasets $\\rightarrow$ merge $\\rightarrow$ batch $\\rightarrow$ sort $\\rightarrow$ cache $\\rightarrow$ pad"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "---\n",
      "# The main loop\n",
      "---"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Blocks provides a very barebone main loop, which basically does two things:\n",
      "\n",
      "1. It requests epochs from the data stream, and feeds batches from this epoch into the training algorithms `process_batch` method.\n",
      "2. At a variety of points (`before_epoch`, `after_batch`, `after_n_epochs`, `every_n_batches`, `on_interruption`, `on_resume`, etc.) it runs a set of extensions."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from blocks.bricks import Tanh, Softmax\n",
      "from blocks.bricks.cost import CategoricalCrossEntropy, MisclassificationRate\n",
      "from blocks.datasets.mnist import MNIST\n",
      "from blocks.algorithms import GradientDescent, SteepestDescent\n",
      "from blocks.extensions import FinishAfter, Printing\n",
      "from blocks.extensions.monitoring import TrainingDataMonitoring, DataStreamMonitoring\n",
      "from blocks.main_loop import MainLoop\n",
      "from blocks.monitoring import aggregation"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 29
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Construct the bricks and initialize their parameters\n",
      "mlp = MLP([Tanh(), Softmax()], [784, 100, 10],\n",
      "          weights_init=IsotropicGaussian(0.01),\n",
      "          biases_init=Constant(0))\n",
      "mlp.initialize()\n",
      "\n",
      "# Create the Theano computation graph\n",
      "x = tensor.matrix('features')\n",
      "y = tensor.lmatrix('targets')\n",
      "probs = mlp.apply(x)\n",
      "cost = CategoricalCrossEntropy().apply(y.flatten(), probs)\n",
      "error_rate = MisclassificationRate().apply(y.flatten(), probs)\n",
      "\n",
      "# Create a managed computation graph for easy access to the weights\n",
      "cg = ComputationGraph([cost])\n",
      "W1, W2 = VariableFilter(roles=[WEIGHTS])(cg.variables)\n",
      "cost = cost + .00005 * (W1 ** 2).sum() + .00005 * (W2 ** 2).sum()\n",
      "cost.name = 'final_cost'\n",
      "\n",
      "# Initialize datasets\n",
      "mnist_train = MNIST(\"train\")\n",
      "mnist_test = MNIST(\"test\")\n",
      "\n",
      "# Initialize the training algorithm\n",
      "algorithm = GradientDescent(\n",
      "    cost=cost, step_rule=SteepestDescent(learning_rate=0.1))\n",
      "\n",
      "# The main loop\n",
      "main_loop = MainLoop(\n",
      "    mlp, # The main loop doesn't use this, but extensions theoretically can\n",
      "    DataStream(mnist_train,\n",
      "               iteration_scheme=ShuffledScheme(\n",
      "                   mnist_train.num_examples, 50)),\n",
      "    algorithm,\n",
      "    extensions=[FinishAfter(after_n_epochs=5),\n",
      "                DataStreamMonitoring(\n",
      "                    [cost, error_rate],\n",
      "                    DataStream(mnist_test,\n",
      "                               iteration_scheme=ShuffledScheme(\n",
      "                                   mnist_test.num_examples, 500)),\n",
      "                    prefix=\"test\"),\n",
      "                TrainingDataMonitoring(\n",
      "                    [cost, error_rate,\n",
      "                     aggregation.mean(algorithm.total_gradient_norm)],\n",
      "                    prefix=\"train\",\n",
      "                    after_every_epoch=True),\n",
      "                Printing()])\n",
      "main_loop.run()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "-------------------------------------------------------------------------------\n",
        "BEFORE FIRST EPOCH\n",
        "-------------------------------------------------------------------------------\n",
        "Training status:\n",
        "\t epochs_done: 0\n",
        "\t iterations_done: 0\n",
        "Log records from the iteration 0:\n",
        "\t test_final_cost: 2.30414938927\n",
        "\t test_misclassificationrate_apply_error_rate: 0.918099993467\n",
        "\n",
        "\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-------------------------------------------------------------------------------\n",
        "AFTER ANOTHER EPOCH\n",
        "-------------------------------------------------------------------------------\n",
        "Training status:\n",
        "\t epochs_done: 1\n",
        "\t iterations_done: 1200\n",
        "Log records from the iteration 1200:\n",
        "\t test_final_cost: 0.271812617779\n",
        "\t test_misclassificationrate_apply_error_rate: 0.0780999995768\n",
        "\t train_final_cost: 0.492633551359\n",
        "\t train_misclassificationrate_apply_error_rate: 0.132183332732\n",
        "\t train_total_gradient_norm: 0.717878103256\n",
        "\n",
        "\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-------------------------------------------------------------------------------\n",
        "AFTER ANOTHER EPOCH\n",
        "-------------------------------------------------------------------------------\n",
        "Training status:\n",
        "\t epochs_done: 2\n",
        "\t iterations_done: 2400\n",
        "Log records from the iteration 2400:\n",
        "\t test_final_cost: 0.200972408056\n",
        "\t test_misclassificationrate_apply_error_rate: 0.0575999999419\n",
        "\t train_final_cost: 0.242820963264\n",
        "\t train_misclassificationrate_apply_error_rate: 0.0693166658065\n",
        "\t train_total_gradient_norm: 0.699945986271\n",
        "\n",
        "\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-------------------------------------------------------------------------------\n",
        "AFTER ANOTHER EPOCH\n",
        "-------------------------------------------------------------------------------\n",
        "Training status:\n",
        "\t epochs_done: 3\n",
        "\t iterations_done: 3600\n",
        "Log records from the iteration 3600:\n",
        "\t test_final_cost: 0.165933743119\n",
        "\t test_misclassificationrate_apply_error_rate: 0.0464000005275\n",
        "\t train_final_cost: 0.188066735864\n",
        "\t train_misclassificationrate_apply_error_rate: 0.0523499991683\n",
        "\t train_total_gradient_norm: 0.636475622654\n",
        "\n",
        "\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-------------------------------------------------------------------------------\n",
        "AFTER ANOTHER EPOCH\n",
        "-------------------------------------------------------------------------------\n",
        "Training status:\n",
        "\t epochs_done: 4\n",
        "\t iterations_done: 4800\n",
        "Log records from the iteration 4800:\n",
        "\t test_final_cost: 0.147225037217\n",
        "\t test_misclassificationrate_apply_error_rate: 0.0399999997579\n",
        "\t train_final_cost: 0.155039444566\n",
        "\t train_misclassificationrate_apply_error_rate: 0.0428499992006\n",
        "\t train_total_gradient_norm: 0.59491789341\n",
        "\n",
        "\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-------------------------------------------------------------------------------\n",
        "AFTER ANOTHER EPOCH\n",
        "-------------------------------------------------------------------------------\n",
        "Training status:\n",
        "\t epochs_done: 5\n",
        "\t iterations_done: 6000\n",
        "Log records from the iteration 6000:\n",
        "\t test_final_cost: 0.129283457994\n",
        "\t test_misclassificationrate_apply_error_rate: 0.0353000001051\n",
        "\t train_final_cost: 0.134246379137\n",
        "\t train_misclassificationrate_apply_error_rate: 0.0362999992725\n",
        "\t train_total_gradient_norm: 0.563258886337\n",
        "\t training_finish_requested: True\n",
        "\n",
        "\n",
        "-------------------------------------------------------------------------------\n",
        "TRAINING HAS BEEN FINISHED:\n",
        "-------------------------------------------------------------------------------\n",
        "Training status:\n",
        "\t epochs_done: 5\n",
        "\t iterations_done: 6000\n",
        "Log records from the iteration 6000:\n",
        "\t test_final_cost: 0.129283457994\n",
        "\t test_misclassificationrate_apply_error_rate: 0.0353000001051\n",
        "\t train_final_cost: 0.134246379137\n",
        "\t train_misclassificationrate_apply_error_rate: 0.0362999992725\n",
        "\t train_total_gradient_norm: 0.563258886337\n",
        "\t training_finish_requested: True\n",
        "\t training_finished: True\n",
        "\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/home/vanmerb/lisa/blocks/blocks/log.py:147: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.\n",
        "  if value != default_value:\n"
       ]
      }
     ],
     "prompt_number": 30
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Monitoring"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Blocks supports two kinds of monitoring:\n",
      "\n",
      "1. `DataStreamMonitoring` performs monitoring on a separate data stream i.e. validation or test set\n",
      "2. `TrainingDataMonitoring` monitors values on each batch and is compile together with the training function"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Aggregation"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "An aggregation scheme describes how a monitored variable should be calculated over batches in a data stream."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "example_mean = aggregation.mean(cost, x.shape[0])  # Average over examples\n",
      "batch_mean = aggregation.mean(cost, 1)  # Average over batches"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 31
    }
   ],
   "metadata": {}
  }
 ]
 }