mspan · September 26, 2013 00:10
diff --git a/RandomForestsInTen_final.ipynb b/RandomForestsInTen_final.ipynb
 {
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# Random Forests in (about) 10 Minutes\n",
      "\n",
      "### Mike Spaner\n",
      "-  @mspan\n",
      "-  [email protected]\n",
      "-  blog: www.datascientist.co\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The random forest algorithm was selected to build the prediction model. Brieman and Cutler  [2] nicely summarizes the following benefits of using random forests:\n",
      "### Features of Random Forests\n",
      "- It is unexcelled in accuracy among current algorithms.\n",
      "- It can handle thousands of input variables without variable deletion.\n",
      "- It gives estimates of what variables are important in the classification.\n",
      "- It has methods for balancing error in class population unbalanced data sets.\n",
      "- It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.\n",
      "- It offers an experimental method for detecting variable interactions.\u201d\n",
      "\n",
      "http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm [2]"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%pylab inline"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Populating the interactive namespace from numpy and matplotlib\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.ensemble import RandomForestClassifier\n",
      "from sklearn.tree import DecisionTreeClassifier\n",
      "import pandas as pd\n",
      "import numpy as np"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Modern cell phones are equipped with a variety of sensors including GPS sensors, triaxial accelerometers, triaxial gyroscopic sensors, multiple cameras.  This portable, personal, low cost sensor network is used for a wide diversity of purposes including location tracking (GPS), voice recognition and search (microphone), fitness tracking ( accelerometers, gyroscopes, GPS), and eye-tracking ( cameras). \n",
      "\n",
      "Much research is being conducted to build functions that predict human activities such as whether an individual is laying down, walking, climbing stairs, etc. This data analysis develops  a  function that can be used to predict whether a person is walking, walking up or down stairs, standing, sitting, and laying down. Specifically, a freely available dataset from the UCI Machine Learning Repository  [1] was used to predict these six human activities in this analysis.\n",
      "\n",
      "The activities predicted are all associated with bulk movement of the human body. Therefore the motion sensors of the cell phone are primary indicators(accelerometer and gyroscopic sensor data). The data in [1] was collected from test subjects wearing waist-mounted Samsung cell phones.\n",
      "\n"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Load the samsung phone data: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData = pd.read_csv('./samsungData.csv')\n",
      "samsungData = samsungData.drop(['Unnamed: 0'], axis=1) # drop the index numbers in the dataset"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData.columns[:2]\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "Index([u'tBodyAcc-mean()-X', u'tBodyAcc-mean()-Y'], dtype=object)"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Look at the first 50 columns\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData.columns[:50]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "Index([u'tBodyAcc-mean()-X', u'tBodyAcc-mean()-Y', u'tBodyAcc-mean()-Z', u'tBodyAcc-std()-X', u'tBodyAcc-std()-Y', u'tBodyAcc-std()-Z', u'tBodyAcc-mad()-X', u'tBodyAcc-mad()-Y', u'tBodyAcc-mad()-Z', u'tBodyAcc-max()-X', u'tBodyAcc-max()-Y', u'tBodyAcc-max()-Z', u'tBodyAcc-min()-X', u'tBodyAcc-min()-Y', u'tBodyAcc-min()-Z', u'tBodyAcc-sma()', u'tBodyAcc-energy()-X', u'tBodyAcc-energy()-Y', u'tBodyAcc-energy()-Z', u'tBodyAcc-iqr()-X', u'tBodyAcc-iqr()-Y', u'tBodyAcc-iqr()-Z', u'tBodyAcc-entropy()-X', u'tBodyAcc-entropy()-Y', u'tBodyAcc-entropy()-Z', u'tBodyAcc-arCoeff()-X,1', u'tBodyAcc-arCoeff()-X,2', u'tBodyAcc-arCoeff()-X,3', u'tBodyAcc-arCoeff()-X,4', u'tBodyAcc-arCoeff()-Y,1', u'tBodyAcc-arCoeff()-Y,2', u'tBodyAcc-arCoeff()-Y,3', u'tBodyAcc-arCoeff()-Y,4', u'tBodyAcc-arCoeff()-Z,1', u'tBodyAcc-arCoeff()-Z,2', u'tBodyAcc-arCoeff()-Z,3', u'tBodyAcc-arCoeff()-Z,4', u'tBodyAcc-correlation()-X,Y', u'tBodyAcc-correlation()-X,Z', u'tBodyAcc-correlation()-Y,Z', u'tGravityAcc-mean()-X', u'tGravityAcc-mean()-Y', u'tGravityAcc-mean()-Z', u'tGravityAcc-std()-X', u'tGravityAcc-std()-Y', u'tGravityAcc-std()-Z', u'tGravityAcc-mad()-X', u'tGravityAcc-mad()-Y', u'tGravityAcc-mad()-Z', u'tGravityAcc-max()-X'], dtype=object)"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Generate a frequency table:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData['activity'].value_counts()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 6,
       "text": [
        "laying      1407\n",
        "standing    1374\n",
        "sitting     1286\n",
        "walk        1226\n",
        "walkup      1073\n",
        "walkdown     986\n",
        "dtype: int64"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData.shape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 7,
       "text": [
        "(7352, 563)"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData.describe()\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n",
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n",
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n"
       ]
      },
      {
       "html": [
        "<pre>\n",
        "&lt;class 'pandas.core.frame.DataFrame'&gt;\n",
        "Index: 8 entries, count to max\n",
        "Columns: 562 entries, tBodyAcc-mean()-X to subject\n",
        "dtypes: float64(562)\n",
        "</pre>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 8,
       "text": [
        "<class 'pandas.core.frame.DataFrame'>\n",
        "Index: 8 entries, count to max\n",
        "Columns: 562 entries, tBodyAcc-mean()-X to subject\n",
        "dtypes: float64(562)"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "#### Take peak a the last few columns:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData.columns[550:563]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 9,
       "text": [
        "Index([u'fBodyBodyGyroJerkMag-maxInds', u'fBodyBodyGyroJerkMag-meanFreq()', u'fBodyBodyGyroJerkMag-skewness()', u'fBodyBodyGyroJerkMag-kurtosis()', u'angle(tBodyAccMean,gravity)', u'angle(tBodyAccJerkMean),gravityMean)', u'angle(tBodyGyroMean,gravityMean)', u'angle(tBodyGyroJerkMean,gravityMean)', u'angle(X,gravityMean)', u'angle(Y,gravityMean)', u'angle(Z,gravityMean)', u'subject', u'activity'], dtype=object)"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We are not going to use the subject ID in this tutorial analysis - so lets drop it from the dataframe. It is probably more appropriate to segment specific subjects for the training and test sets. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData = samsungData.drop([u'subject'], axis=1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData.columns[550:563]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "Index([u'fBodyBodyGyroJerkMag-maxInds', u'fBodyBodyGyroJerkMag-meanFreq()', u'fBodyBodyGyroJerkMag-skewness()', u'fBodyBodyGyroJerkMag-kurtosis()', u'angle(tBodyAccMean,gravity)', u'angle(tBodyAccJerkMean),gravityMean)', u'angle(tBodyGyroMean,gravityMean)', u'angle(tBodyGyroJerkMean,gravityMean)', u'angle(X,gravityMean)', u'angle(Y,gravityMean)', u'angle(Z,gravityMean)', u'activity'], dtype=object)"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "samsungData.shape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "(7352, 562)"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df = samsungData\n",
      "df['catActivity'] = pd.Categorical.from_array(df.activity)\n",
      "df['catActivity'].head (5)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 13,
       "text": [
        "0    standing\n",
        "1    standing\n",
        "2    standing\n",
        "3    standing\n",
        "4    standing\n",
        "Name: catActivity, dtype: object"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "thanks to http://blog.yhathq.com/posts/random-forests-in-python.html\n",
      "\n",
      "### CREATE A RANDOM FOREST"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "df['is_train']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "0      True\n",
        "1      True\n",
        "2      True\n",
        "3      True\n",
        "4      True\n",
        "5     False\n",
        "6      True\n",
        "7      True\n",
        "8     False\n",
        "9     False\n",
        "10     True\n",
        "11     True\n",
        "12     True\n",
        "13    False\n",
        "14     True\n",
        "...\n",
        "7337     True\n",
        "7338     True\n",
        "7339    False\n",
        "7340    False\n",
        "7341     True\n",
        "7342     True\n",
        "7343     True\n",
        "7344    False\n",
        "7345     True\n",
        "7346     True\n",
        "7347     True\n",
        "7348     True\n",
        "7349     True\n",
        "7350     True\n",
        "7351     True\n",
        "Name: is_train, Length: 7352, dtype: bool"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "train, test = df[df['is_train']==True], df[df['is_train']==False]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "test.shape"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "(1760, 564)"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "features = df.columns[:561]\n",
      "clf = RandomForestClassifier(n_estimators = 300, n_jobs=-1)\n",
      "y= train['catActivity']\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "res = clf.fit(train[features], y)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 19
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "preds = clf.predict(test[features])\n",
      "pd.crosstab(test['catActivity'], preds, rownames=['actual'], colnames=['preds'])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n",
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th>preds</th>\n",
        "      <th>laying</th>\n",
        "      <th>sitting</th>\n",
        "      <th>standing</th>\n",
        "      <th>walk</th>\n",
        "      <th>walkdown</th>\n",
        "      <th>walkup</th>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>actual</th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>laying</th>\n",
        "      <td> 322</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>sitting</th>\n",
        "      <td>   0</td>\n",
        "      <td> 290</td>\n",
        "      <td>   5</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>standing</th>\n",
        "      <td>   0</td>\n",
        "      <td>   9</td>\n",
        "      <td> 335</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>walk</th>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td> 301</td>\n",
        "      <td>   0</td>\n",
        "      <td>   2</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>walkdown</th>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   4</td>\n",
        "      <td> 231</td>\n",
        "      <td>   4</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>walkup</th>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   2</td>\n",
        "      <td> 255</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 20,
       "text": [
        "preds     laying  sitting  standing  walk  walkdown  walkup\n",
        "actual                                                     \n",
        "laying       322        0         0     0         0       0\n",
        "sitting        0      290         5     0         0       0\n",
        "standing       0        9       335     0         0       0\n",
        "walk           0        0         0   301         0       2\n",
        "walkdown       0        0         0     4       231       4\n",
        "walkup         0        0         0     0         2     255"
       ]
      }
     ],
     "prompt_number": 20
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "### CREATE A DECISION TREE"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "clf_decision_tree = DecisionTreeClassifier(random_state=1234)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 21
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "res_decision_tree = clf_decision_tree.fit(train[features], y)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 22
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "preds_decision_tree = clf_decision_tree.predict(test[features])\n",
      "pd.crosstab(test['catActivity'], preds_decision_tree, rownames=['actual'], colnames=['preds'])\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n",
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th>preds</th>\n",
        "      <th>laying</th>\n",
        "      <th>sitting</th>\n",
        "      <th>standing</th>\n",
        "      <th>walk</th>\n",
        "      <th>walkdown</th>\n",
        "      <th>walkup</th>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>actual</th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "      <th></th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>laying</th>\n",
        "      <td> 322</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>sitting</th>\n",
        "      <td>   0</td>\n",
        "      <td> 275</td>\n",
        "      <td>  20</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>standing</th>\n",
        "      <td>   0</td>\n",
        "      <td>  30</td>\n",
        "      <td> 314</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>walk</th>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td> 282</td>\n",
        "      <td>   9</td>\n",
        "      <td>  12</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>walkdown</th>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>  13</td>\n",
        "      <td> 214</td>\n",
        "      <td>  12</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>walkup</th>\n",
        "      <td>   0</td>\n",
        "      <td>   0</td>\n",
        "      <td>   1</td>\n",
        "      <td>   8</td>\n",
        "      <td>   8</td>\n",
        "      <td> 240</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 23,
       "text": [
        "preds     laying  sitting  standing  walk  walkdown  walkup\n",
        "actual                                                     \n",
        "laying       322        0         0     0         0       0\n",
        "sitting        0      275        20     0         0       0\n",
        "standing       0       30       314     0         0       0\n",
        "walk           0        0         0   282         9      12\n",
        "walkdown       0        0         0    13       214      12\n",
        "walkup         0        0         1     8         8     240"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pd.DataFrame(clf_decision_tree.feature_importances_).sort(columns = [0], axis=0, ascending = False).head(15)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n",
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>0</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>52 </th>\n",
        "      <td> 0.233659</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>389</th>\n",
        "      <td> 0.200651</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>559</th>\n",
        "      <td> 0.144526</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>504</th>\n",
        "      <td> 0.109829</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>69 </th>\n",
        "      <td> 0.096218</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>57 </th>\n",
        "      <td> 0.028644</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>159</th>\n",
        "      <td> 0.018791</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>37 </th>\n",
        "      <td> 0.010430</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>65 </th>\n",
        "      <td> 0.010020</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>132</th>\n",
        "      <td> 0.008732</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>450</th>\n",
        "      <td> 0.008613</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>451</th>\n",
        "      <td> 0.006607</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>432</th>\n",
        "      <td> 0.005939</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>157</th>\n",
        "      <td> 0.005739</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>418</th>\n",
        "      <td> 0.004633</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 24,
       "text": [
        "            0\n",
        "52   0.233659\n",
        "389  0.200651\n",
        "559  0.144526\n",
        "504  0.109829\n",
        "69   0.096218\n",
        "57   0.028644\n",
        "159  0.018791\n",
        "37   0.010430\n",
        "65   0.010020\n",
        "132  0.008732\n",
        "450  0.008613\n",
        "451  0.006607\n",
        "432  0.005939\n",
        "157  0.005739\n",
        "418  0.004633"
       ]
      }
     ],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "z = pd.DataFrame(clf.feature_importances_)\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 25
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "z_sorted = z.sort(columns = [0], axis=0, ascending = False)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 26
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "z_sorted.head(15)\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n",
        "/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
        "\n",
        "  warnings.warn(d.msg, DeprecationWarning)\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>0</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>52 </th>\n",
        "      <td> 0.032558</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>40 </th>\n",
        "      <td> 0.031021</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>56 </th>\n",
        "      <td> 0.030414</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>558</th>\n",
        "      <td> 0.026724</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>559</th>\n",
        "      <td> 0.026085</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>49 </th>\n",
        "      <td> 0.025362</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>53 </th>\n",
        "      <td> 0.024608</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>41 </th>\n",
        "      <td> 0.022748</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>50 </th>\n",
        "      <td> 0.022173</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>57 </th>\n",
        "      <td> 0.016317</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>381</th>\n",
        "      <td> 0.012155</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>42 </th>\n",
        "      <td> 0.011127</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>51 </th>\n",
        "      <td> 0.010070</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>389</th>\n",
        "      <td> 0.009222</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>201</th>\n",
        "      <td> 0.008905</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 27,
       "text": [
        "            0\n",
        "52   0.032558\n",
        "40   0.031021\n",
        "56   0.030414\n",
        "558  0.026724\n",
        "559  0.026085\n",
        "49   0.025362\n",
        "53   0.024608\n",
        "41   0.022748\n",
        "50   0.022173\n",
        "57   0.016317\n",
        "381  0.012155\n",
        "42   0.011127\n",
        "51   0.010070\n",
        "389  0.009222\n",
        "201  0.008905"
       ]
      }
     ],
     "prompt_number": 27
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "num_important = 8\n",
      "important_factors = (df.columns[z_sorted.index[0:num_important]].values)\n",
      "important_factors\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 28,
       "text": [
        "array(['tGravityAcc-min()-X', 'tGravityAcc-mean()-X',\n",
        "       'tGravityAcc-energy()-X', 'angle(X,gravityMean)',\n",
        "       'angle(Y,gravityMean)', 'tGravityAcc-max()-X',\n",
        "       'tGravityAcc-min()-Y', 'tGravityAcc-mean()-Y'], dtype=object)"
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.display import HTML\n",
      "HTML(\"\"\"\n",
      "<style> \n",
      "\n",
      "div.cell {\n",
      "  width: 940px;\n",
      "  margin-left: auto;\n",
      "  margin-right: auto;\n",
      "}\n",
      "\n",
      ".rendered_html {\n",
      "  font-size: 100%;\n",
      "}\n",
      "\n",
      "</style>\"\"\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "\n",
        "<style> \n",
        "\n",
        "div.cell {\n",
        "  width: 940px;\n",
        "  margin-left: auto;\n",
        "  margin-right: auto;\n",
        "}\n",
        "\n",
        ".rendered_html {\n",
        "  font-size: 100%;\n",
        "}\n",
        "\n",
        "</style>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 29,
       "text": [
        "<IPython.core.display.HTML at 0x1078b9410>"
       ]
      }
     ],
     "prompt_number": 29
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      " ## References\n",
      "\n",
      "1.  Jorge L. Reyes-Ortiz, Davide Anguita, Alessandro Ghio, Luca Oneto. \n",
      "Smartlab - Non Linear Complex Systems Laboratory \n",
      "DITEN - Universit\u00c3  degli Studi di Genova, Genoa I-16145, Italy. \n",
      "http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones\n",
      "2.  Random Forests\u201d by Leo Breiman and Adele Cutler. Page URL (accessed March 4, 2013): http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm   \n",
      "\n"
     ]
    }
   ],
   "metadata": {}
  }
 ]
 }