jamesmcm · July 3, 2014 12:33
diff --git a/gistfile1.txt b/gistfile1.txt
 {
 "metadata": {
  "celltoolbar": "Slideshow",
  "name": "",
  "signature": "sha256:68f7d12dcf5ee7656a300d7009496fbd40f486e1dcb06c46c9d48f015daa4dea"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "<script language=\"javascript\">\n",
      "\n",
      "        function MouseRollover(MyImage) {\n",
      "\n",
      "        MyImage.src = \"rfrplot.png\";\n",
      "        \n",
      "    }\n",
      "    \n",
      "        function MouseOut(MyImage) {\n",
      "        MyImage.src = \"gprplot.png\";\n",
      "    }\n",
      "</script>\n",
      "\n",
      "\n",
      "<center>\n",
      "\n",
      "<h2>DREAM9</h2>\n",
      "<h2>Acute Myeloid Leukemia Outcome Prediction Challenge</h2>\n",
      "<br>\n",
      "<h3>01/07/2014</h3>\n",
      "\n",
      "\n",
      "<h3> James McMurray </h3>\n",
      "PhD Student<br>\n",
      "MPI Intelligent Systems, T\u00fcbingen, Germany\n",
      "\n",
      "</center>\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "The DREAM Challenges\n",
      "------------------------\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* __D__ialogue for __R__everse __E__ngineering __A__ssessments and __M__ethods"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Challenges focus on Systems Biology"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Examples of previous challenges include inferring gene regulatory networks, and predicting breast cancer survival."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Sponsors:\n",
      "  * Columbia University Center for Multiscale Analysis Genomic and Cellular Networks\n",
      "  * IBM Computational Biology Center\n",
      "  * The New York Academy of Sciences  \n",
      "  * NIH Roadmap Initiative\n",
      "  * Sage Bionetworks"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "DREAM9 Challenges\n",
      "-------------------\n",
      "\n",
      "\n",
      "\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Three DREAM9 Challenges   \n",
      "\n",
      "  * Alzheimer\u2019s Disease Big Data DREAM Challenge \\#1    \n",
      "  \n",
      "  * The Broad-DREAM Gene Essentiality Prediction Challenge    \n",
      "  \n",
      "  * The DREAM9 __Acute Myeloid Leukemia (AML) Outcome Prediction Challenge__     \n",
      "  \n",
      "    * Predict the outcome of treatment of AML patients (resistant or remission), their remission duration and overall survival based on clinical cytogentics, known genetics markers and phosphoproteomic data.\n",
      "\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Chosen as the tasks seemed more intuitive (doesn't require knowledge of medical imaging, etc.)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* No data access restrictions (unlike Alzheimer's disease challenge)\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Acute Myeloid Leukemia\n",
      "------------------------\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Acute Myeloid Leukemia is a particularly lethal type of leukemia."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Affects the myeloid cells."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* In 2014, there is projected to be ~18,000 new cases of AML, and ~10,000 deaths from the disease. \n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Only approximately a quarter of the patients diagnosed with AML survive beyond 5 years."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Acute Myeloid Leukemia Outcome Prediction Challenge\n",
      "-----------------------------------------------------\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Participants are given data on AML patients including 40 clinical correlates and the expression level of 231 proteins and phosphoproteins probed by reverse phase protein array analysis.\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Note that the expression levels include some missing data.\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Challenge consists of three sub-challenges:\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* __Subchallenge 1__: Determine the best model to predict which AML patients will have Complete Remission or will be Primary Resistant\n",
      "  * Classification"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* __Subchallenge 2__: For patients who have Complete Remission, predict remission duration.\n",
      "  * Regression"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* __Subchallenge 3__: Predict the overall survival time for each patient\n",
      "  * Regression"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Random Forests\n",
      "------------------------\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* An ensemble of decision trees trained on bootstrapped samples - can be used for classification and regression"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "<img src=\"./dtree.gif\" style=\"width: 500px;\"/>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Decision trees are trained by choosing splits which maximise information gain and minimise a loss function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Random Forest Regression example\n",
      "---------------------------------\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Can we just use this and be finished?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "<img src=\"./rfrplot.png\" style=\"width: 500px;\"/>"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Reasonable on observed data, but cannot make predictions outside of data range"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Gaussian Process Regression\n",
      "----------------------------\n",
      "\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "<img id='hovim' src=\"./gprplot.png\" style=\"width: 500px;\" onMouseOver=\"MouseRollover(this)\" \n",
      "onMouseOut=\"MouseOut(this)\" />\n",
      "\n",
      "\n",
      "<!-- \n",
      "<img src=\"./gprplot.png\" style=\"width: 500px;\"/>\n",
      "-->"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Correct choice of assumptions allows prediction outside of input data range"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Provides uncertainty estimate too"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* How does it work?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Gaussian Processes\n",
      "----------------------------\n",
      "\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* A Bayesian non-parametric model"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Assumptions are encoded in the choice of _covariance function_\n",
      "  * In previous example chose periodic covariance function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Returns a distribution over functions - the family of functions is specified by the covariance function and its hyperparameters"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* The trick is in the correct choice of kernel function"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Can also be used to impute missing values - conceptually by creating a GP for the input dimensions with missing values against the others"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "slide"
      }
     },
     "source": [
      "Conclusion\n",
      "-------------\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Currently participating in the DREAM9 AML Outcome Prediction Challenge\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* There are three sub-challenges, including classification and regression tasks\n",
      "\n",
      "\n",
      "\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* For first submission used Random Forests for all tasks\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Gaussian Process approach seems promising:\n",
      "  * Can impute missing data reasonably\n",
      "  * Provides uncertainty estimates\n",
      "  * With good choice of covariance function should be able to provide good predictions over a lot of the data space"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "* Main task lies in correctly choosing the covariance functions\n",
      "  * Dealing with mix of categorical data, etc.\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "fragment"
      }
     },
     "source": [
      "<center>\n",
      "<strong>Thanks for you time!</strong>\n",
      "</center>\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {
      "slideshow": {
       "slide_type": "skip"
      }
     },
     "source": [
      "TODO\n",
      "======\n",
      "\n",
      "* Why can't use simple autoencoder\n",
      "* Add motivation section at start - dimensionality reduction\n",
      "* Use of RBM's for pretraining\n",
      "* Visualisation\n",
      "* Data whitening, etc.\n",
      "* Actual example - pre-training makes linearly seperable\n",
      "* Font size\n",
      "\n",
      "ipython nbconvert pres.ipynb --to slides --post serve\n",
      "\n",
      "\n"
     ]
    }
   ],
   "metadata": {}
  }
 ]
 }
	{
	"metadata": {
	"celltoolbar": "Slideshow",
	"name": "",
	"signature": "sha256:68f7d12dcf5ee7656a300d7009496fbd40f486e1dcb06c46c9d48f015daa4dea"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"<script language=\"javascript\">\n",
	"\n",
	" function MouseRollover(MyImage) {\n",
	"\n",
	" MyImage.src = \"rfrplot.png\";\n",
	" \n",
	" }\n",
	" \n",
	" function MouseOut(MyImage) {\n",
	" MyImage.src = \"gprplot.png\";\n",
	" }\n",
	"</script>\n",
	"\n",
	"\n",
	"<center>\n",
	"\n",
	"<h2>DREAM9</h2>\n",
	"<h2>Acute Myeloid Leukemia Outcome Prediction Challenge</h2>\n",
	"<br>\n",
	"<h3>01/07/2014</h3>\n",
	"\n",
	"\n",
	"<h3> James McMurray </h3>\n",
	"PhD Student<br>\n",
	"MPI Intelligent Systems, T\u00fcbingen, Germany\n",
	"\n",
	"</center>\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"The DREAM Challenges\n",
	"------------------------\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* __D__ialogue for __R__everse __E__ngineering __A__ssessments and __M__ethods"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Challenges focus on Systems Biology"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Examples of previous challenges include inferring gene regulatory networks, and predicting breast cancer survival."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Sponsors:\n",
	" * Columbia University Center for Multiscale Analysis Genomic and Cellular Networks\n",
	" * IBM Computational Biology Center\n",
	" * The New York Academy of Sciences \n",
	" * NIH Roadmap Initiative\n",
	" * Sage Bionetworks"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"DREAM9 Challenges\n",
	"-------------------\n",
	"\n",
	"\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Three DREAM9 Challenges \n",
	"\n",
	" * Alzheimer\u2019s Disease Big Data DREAM Challenge \\#1 \n",
	" \n",
	" * The Broad-DREAM Gene Essentiality Prediction Challenge \n",
	" \n",
	" * The DREAM9 __Acute Myeloid Leukemia (AML) Outcome Prediction Challenge__ \n",
	" \n",
	" * Predict the outcome of treatment of AML patients (resistant or remission), their remission duration and overall survival based on clinical cytogentics, known genetics markers and phosphoproteomic data.\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Chosen as the tasks seemed more intuitive (doesn't require knowledge of medical imaging, etc.)"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* No data access restrictions (unlike Alzheimer's disease challenge)\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"Acute Myeloid Leukemia\n",
	"------------------------\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Acute Myeloid Leukemia is a particularly lethal type of leukemia."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Affects the myeloid cells."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* In 2014, there is projected to be ~18,000 new cases of AML, and ~10,000 deaths from the disease. \n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Only approximately a quarter of the patients diagnosed with AML survive beyond 5 years."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"Acute Myeloid Leukemia Outcome Prediction Challenge\n",
	"-----------------------------------------------------\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Participants are given data on AML patients including 40 clinical correlates and the expression level of 231 proteins and phosphoproteins probed by reverse phase protein array analysis.\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Note that the expression levels include some missing data.\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Challenge consists of three sub-challenges:\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* __Subchallenge 1__: Determine the best model to predict which AML patients will have Complete Remission or will be Primary Resistant\n",
	" * Classification"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* __Subchallenge 2__: For patients who have Complete Remission, predict remission duration.\n",
	" * Regression"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* __Subchallenge 3__: Predict the overall survival time for each patient\n",
	" * Regression"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"Random Forests\n",
	"------------------------\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* An ensemble of decision trees trained on bootstrapped samples - can be used for classification and regression"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"<img src=\"./dtree.gif\" style=\"width: 500px;\"/>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Decision trees are trained by choosing splits which maximise information gain and minimise a loss function"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"Random Forest Regression example\n",
	"---------------------------------\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Can we just use this and be finished?"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"<img src=\"./rfrplot.png\" style=\"width: 500px;\"/>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Reasonable on observed data, but cannot make predictions outside of data range"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"Gaussian Process Regression\n",
	"----------------------------\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"<img id='hovim' src=\"./gprplot.png\" style=\"width: 500px;\" onMouseOver=\"MouseRollover(this)\" \n",
	"onMouseOut=\"MouseOut(this)\" />\n",
	"\n",
	"\n",
	"<!-- \n",
	"<img src=\"./gprplot.png\" style=\"width: 500px;\"/>\n",
	"-->"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Correct choice of assumptions allows prediction outside of input data range"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Provides uncertainty estimate too"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* How does it work?"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"Gaussian Processes\n",
	"----------------------------\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* A Bayesian non-parametric model"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Assumptions are encoded in the choice of _covariance function_\n",
	" * In previous example chose periodic covariance function"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Returns a distribution over functions - the family of functions is specified by the covariance function and its hyperparameters"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* The trick is in the correct choice of kernel function"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Can also be used to impute missing values - conceptually by creating a GP for the input dimensions with missing values against the others"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "slide"
	}
	},
	"source": [
	"Conclusion\n",
	"-------------\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Currently participating in the DREAM9 AML Outcome Prediction Challenge\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* There are three sub-challenges, including classification and regression tasks\n",
	"\n",
	"\n",
	"\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* For first submission used Random Forests for all tasks\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Gaussian Process approach seems promising:\n",
	" * Can impute missing data reasonably\n",
	" * Provides uncertainty estimates\n",
	" * With good choice of covariance function should be able to provide good predictions over a lot of the data space"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"* Main task lies in correctly choosing the covariance functions\n",
	" * Dealing with mix of categorical data, etc.\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "fragment"
	}
	},
	"source": [
	"<center>\n",
	"<strong>Thanks for you time!</strong>\n",
	"</center>\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"slideshow": {
	"slide_type": "skip"
	}
	},
	"source": [
	"TODO\n",
	"======\n",
	"\n",
	"* Why can't use simple autoencoder\n",
	"* Add motivation section at start - dimensionality reduction\n",
	"* Use of RBM's for pretraining\n",
	"* Visualisation\n",
	"* Data whitening, etc.\n",
	"* Actual example - pre-training makes linearly seperable\n",
	"* Font size\n",
	"\n",
	"ipython nbconvert pres.ipynb --to slides --post serve\n",
	"\n",
	"\n"
	]
	}
	],
	"metadata": {}
	}
	]
	}
No results found