Skip to content

Instantly share code, notes, and snippets.

@mspan
Created September 26, 2013 00:10
Show Gist options
  • Save mspan/6708066 to your computer and use it in GitHub Desktop.
Save mspan/6708066 to your computer and use it in GitHub Desktop.
Overview of Random Forests for newhaven.io presentation
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Random Forests in (about) 10 Minutes\n",
"\n",
"### Mike Spaner\n",
"- @mspan\n",
"- [email protected]\n",
"- blog: www.datascientist.co\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The random forest algorithm was selected to build the prediction model. Brieman and Cutler [2] nicely summarizes the following benefits of using random forests:\n",
"### Features of Random Forests\n",
"- It is unexcelled in accuracy among current algorithms.\n",
"- It can handle thousands of input variables without variable deletion.\n",
"- It gives estimates of what variables are important in the classification.\n",
"- It has methods for balancing error in class population unbalanced data sets.\n",
"- It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.\n",
"- It offers an experimental method for detecting variable interactions.\u201d\n",
"\n",
"http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm [2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%pylab inline"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"Populating the interactive namespace from numpy and matplotlib\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"import pandas as pd\n",
"import numpy as np"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Modern cell phones are equipped with a variety of sensors including GPS sensors, triaxial accelerometers, triaxial gyroscopic sensors, multiple cameras. This portable, personal, low cost sensor network is used for a wide diversity of purposes including location tracking (GPS), voice recognition and search (microphone), fitness tracking ( accelerometers, gyroscopes, GPS), and eye-tracking ( cameras). \n",
"\n",
"Much research is being conducted to build functions that predict human activities such as whether an individual is laying down, walking, climbing stairs, etc. This data analysis develops a function that can be used to predict whether a person is walking, walking up or down stairs, standing, sitting, and laying down. Specifically, a freely available dataset from the UCI Machine Learning Repository [1] was used to predict these six human activities in this analysis.\n",
"\n",
"The activities predicted are all associated with bulk movement of the human body. Therefore the motion sensors of the cell phone are primary indicators(accelerometer and gyroscopic sensor data). The data in [1] was collected from test subjects wearing waist-mounted Samsung cell phones.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Load the samsung phone data: "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData = pd.read_csv('./samsungData.csv')\n",
"samsungData = samsungData.drop(['Unnamed: 0'], axis=1) # drop the index numbers in the dataset"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData.columns[:2]\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"Index([u'tBodyAcc-mean()-X', u'tBodyAcc-mean()-Y'], dtype=object)"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Look at the first 50 columns\n"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData.columns[:50]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"Index([u'tBodyAcc-mean()-X', u'tBodyAcc-mean()-Y', u'tBodyAcc-mean()-Z', u'tBodyAcc-std()-X', u'tBodyAcc-std()-Y', u'tBodyAcc-std()-Z', u'tBodyAcc-mad()-X', u'tBodyAcc-mad()-Y', u'tBodyAcc-mad()-Z', u'tBodyAcc-max()-X', u'tBodyAcc-max()-Y', u'tBodyAcc-max()-Z', u'tBodyAcc-min()-X', u'tBodyAcc-min()-Y', u'tBodyAcc-min()-Z', u'tBodyAcc-sma()', u'tBodyAcc-energy()-X', u'tBodyAcc-energy()-Y', u'tBodyAcc-energy()-Z', u'tBodyAcc-iqr()-X', u'tBodyAcc-iqr()-Y', u'tBodyAcc-iqr()-Z', u'tBodyAcc-entropy()-X', u'tBodyAcc-entropy()-Y', u'tBodyAcc-entropy()-Z', u'tBodyAcc-arCoeff()-X,1', u'tBodyAcc-arCoeff()-X,2', u'tBodyAcc-arCoeff()-X,3', u'tBodyAcc-arCoeff()-X,4', u'tBodyAcc-arCoeff()-Y,1', u'tBodyAcc-arCoeff()-Y,2', u'tBodyAcc-arCoeff()-Y,3', u'tBodyAcc-arCoeff()-Y,4', u'tBodyAcc-arCoeff()-Z,1', u'tBodyAcc-arCoeff()-Z,2', u'tBodyAcc-arCoeff()-Z,3', u'tBodyAcc-arCoeff()-Z,4', u'tBodyAcc-correlation()-X,Y', u'tBodyAcc-correlation()-X,Z', u'tBodyAcc-correlation()-Y,Z', u'tGravityAcc-mean()-X', u'tGravityAcc-mean()-Y', u'tGravityAcc-mean()-Z', u'tGravityAcc-std()-X', u'tGravityAcc-std()-Y', u'tGravityAcc-std()-Z', u'tGravityAcc-mad()-X', u'tGravityAcc-mad()-Y', u'tGravityAcc-mad()-Z', u'tGravityAcc-max()-X'], dtype=object)"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Generate a frequency table:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData['activity'].value_counts()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"laying 1407\n",
"standing 1374\n",
"sitting 1286\n",
"walk 1226\n",
"walkup 1073\n",
"walkdown 986\n",
"dtype: int64"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData.shape"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 7,
"text": [
"(7352, 563)"
]
}
],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData.describe()\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stderr",
"text": [
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n",
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n",
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n"
]
},
{
"html": [
"<pre>\n",
"&lt;class 'pandas.core.frame.DataFrame'&gt;\n",
"Index: 8 entries, count to max\n",
"Columns: 562 entries, tBodyAcc-mean()-X to subject\n",
"dtypes: float64(562)\n",
"</pre>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"<class 'pandas.core.frame.DataFrame'>\n",
"Index: 8 entries, count to max\n",
"Columns: 562 entries, tBodyAcc-mean()-X to subject\n",
"dtypes: float64(562)"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Take peak a the last few columns:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData.columns[550:563]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"Index([u'fBodyBodyGyroJerkMag-maxInds', u'fBodyBodyGyroJerkMag-meanFreq()', u'fBodyBodyGyroJerkMag-skewness()', u'fBodyBodyGyroJerkMag-kurtosis()', u'angle(tBodyAccMean,gravity)', u'angle(tBodyAccJerkMean),gravityMean)', u'angle(tBodyGyroMean,gravityMean)', u'angle(tBodyGyroJerkMean,gravityMean)', u'angle(X,gravityMean)', u'angle(Y,gravityMean)', u'angle(Z,gravityMean)', u'subject', u'activity'], dtype=object)"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are not going to use the subject ID in this tutorial analysis - so lets drop it from the dataframe. It is probably more appropriate to segment specific subjects for the training and test sets. "
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData = samsungData.drop([u'subject'], axis=1)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData.columns[550:563]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"Index([u'fBodyBodyGyroJerkMag-maxInds', u'fBodyBodyGyroJerkMag-meanFreq()', u'fBodyBodyGyroJerkMag-skewness()', u'fBodyBodyGyroJerkMag-kurtosis()', u'angle(tBodyAccMean,gravity)', u'angle(tBodyAccJerkMean),gravityMean)', u'angle(tBodyGyroMean,gravityMean)', u'angle(tBodyGyroJerkMean,gravityMean)', u'angle(X,gravityMean)', u'angle(Y,gravityMean)', u'angle(Z,gravityMean)', u'activity'], dtype=object)"
]
}
],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"samsungData.shape"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"(7352, 562)"
]
}
],
"prompt_number": 12
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df = samsungData\n",
"df['catActivity'] = pd.Categorical.from_array(df.activity)\n",
"df['catActivity'].head (5)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"0 standing\n",
"1 standing\n",
"2 standing\n",
"3 standing\n",
"4 standing\n",
"Name: catActivity, dtype: object"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"thanks to http://blog.yhathq.com/posts/random-forests-in-python.html\n",
"\n",
"### CREATE A RANDOM FOREST"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"df['is_train']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 15,
"text": [
"0 True\n",
"1 True\n",
"2 True\n",
"3 True\n",
"4 True\n",
"5 False\n",
"6 True\n",
"7 True\n",
"8 False\n",
"9 False\n",
"10 True\n",
"11 True\n",
"12 True\n",
"13 False\n",
"14 True\n",
"...\n",
"7337 True\n",
"7338 True\n",
"7339 False\n",
"7340 False\n",
"7341 True\n",
"7342 True\n",
"7343 True\n",
"7344 False\n",
"7345 True\n",
"7346 True\n",
"7347 True\n",
"7348 True\n",
"7349 True\n",
"7350 True\n",
"7351 True\n",
"Name: is_train, Length: 7352, dtype: bool"
]
}
],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"train, test = df[df['is_train']==True], df[df['is_train']==False]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"test.shape"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 17,
"text": [
"(1760, 564)"
]
}
],
"prompt_number": 17
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"features = df.columns[:561]\n",
"clf = RandomForestClassifier(n_estimators = 300, n_jobs=-1)\n",
"y= train['catActivity']\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"res = clf.fit(train[features], y)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"preds = clf.predict(test[features])\n",
"pd.crosstab(test['catActivity'], preds, rownames=['actual'], colnames=['preds'])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stderr",
"text": [
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n",
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n"
]
},
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>preds</th>\n",
" <th>laying</th>\n",
" <th>sitting</th>\n",
" <th>standing</th>\n",
" <th>walk</th>\n",
" <th>walkdown</th>\n",
" <th>walkup</th>\n",
" </tr>\n",
" <tr>\n",
" <th>actual</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>laying</th>\n",
" <td> 322</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sitting</th>\n",
" <td> 0</td>\n",
" <td> 290</td>\n",
" <td> 5</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>standing</th>\n",
" <td> 0</td>\n",
" <td> 9</td>\n",
" <td> 335</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>walk</th>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 301</td>\n",
" <td> 0</td>\n",
" <td> 2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>walkdown</th>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 4</td>\n",
" <td> 231</td>\n",
" <td> 4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>walkup</th>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 2</td>\n",
" <td> 255</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 20,
"text": [
"preds laying sitting standing walk walkdown walkup\n",
"actual \n",
"laying 322 0 0 0 0 0\n",
"sitting 0 290 5 0 0 0\n",
"standing 0 9 335 0 0 0\n",
"walk 0 0 0 301 0 2\n",
"walkdown 0 0 0 4 231 4\n",
"walkup 0 0 0 0 2 255"
]
}
],
"prompt_number": 20
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CREATE A DECISION TREE"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"clf_decision_tree = DecisionTreeClassifier(random_state=1234)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 21
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"res_decision_tree = clf_decision_tree.fit(train[features], y)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 22
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"preds_decision_tree = clf_decision_tree.predict(test[features])\n",
"pd.crosstab(test['catActivity'], preds_decision_tree, rownames=['actual'], colnames=['preds'])\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stderr",
"text": [
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n",
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n"
]
},
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>preds</th>\n",
" <th>laying</th>\n",
" <th>sitting</th>\n",
" <th>standing</th>\n",
" <th>walk</th>\n",
" <th>walkdown</th>\n",
" <th>walkup</th>\n",
" </tr>\n",
" <tr>\n",
" <th>actual</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>laying</th>\n",
" <td> 322</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>sitting</th>\n",
" <td> 0</td>\n",
" <td> 275</td>\n",
" <td> 20</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>standing</th>\n",
" <td> 0</td>\n",
" <td> 30</td>\n",
" <td> 314</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>walk</th>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 282</td>\n",
" <td> 9</td>\n",
" <td> 12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>walkdown</th>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 13</td>\n",
" <td> 214</td>\n",
" <td> 12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>walkup</th>\n",
" <td> 0</td>\n",
" <td> 0</td>\n",
" <td> 1</td>\n",
" <td> 8</td>\n",
" <td> 8</td>\n",
" <td> 240</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 23,
"text": [
"preds laying sitting standing walk walkdown walkup\n",
"actual \n",
"laying 322 0 0 0 0 0\n",
"sitting 0 275 20 0 0 0\n",
"standing 0 30 314 0 0 0\n",
"walk 0 0 0 282 9 12\n",
"walkdown 0 0 0 13 214 12\n",
"walkup 0 0 1 8 8 240"
]
}
],
"prompt_number": 23
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"pd.DataFrame(clf_decision_tree.feature_importances_).sort(columns = [0], axis=0, ascending = False).head(15)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stderr",
"text": [
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n",
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n"
]
},
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>52 </th>\n",
" <td> 0.233659</td>\n",
" </tr>\n",
" <tr>\n",
" <th>389</th>\n",
" <td> 0.200651</td>\n",
" </tr>\n",
" <tr>\n",
" <th>559</th>\n",
" <td> 0.144526</td>\n",
" </tr>\n",
" <tr>\n",
" <th>504</th>\n",
" <td> 0.109829</td>\n",
" </tr>\n",
" <tr>\n",
" <th>69 </th>\n",
" <td> 0.096218</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57 </th>\n",
" <td> 0.028644</td>\n",
" </tr>\n",
" <tr>\n",
" <th>159</th>\n",
" <td> 0.018791</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37 </th>\n",
" <td> 0.010430</td>\n",
" </tr>\n",
" <tr>\n",
" <th>65 </th>\n",
" <td> 0.010020</td>\n",
" </tr>\n",
" <tr>\n",
" <th>132</th>\n",
" <td> 0.008732</td>\n",
" </tr>\n",
" <tr>\n",
" <th>450</th>\n",
" <td> 0.008613</td>\n",
" </tr>\n",
" <tr>\n",
" <th>451</th>\n",
" <td> 0.006607</td>\n",
" </tr>\n",
" <tr>\n",
" <th>432</th>\n",
" <td> 0.005939</td>\n",
" </tr>\n",
" <tr>\n",
" <th>157</th>\n",
" <td> 0.005739</td>\n",
" </tr>\n",
" <tr>\n",
" <th>418</th>\n",
" <td> 0.004633</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 24,
"text": [
" 0\n",
"52 0.233659\n",
"389 0.200651\n",
"559 0.144526\n",
"504 0.109829\n",
"69 0.096218\n",
"57 0.028644\n",
"159 0.018791\n",
"37 0.010430\n",
"65 0.010020\n",
"132 0.008732\n",
"450 0.008613\n",
"451 0.006607\n",
"432 0.005939\n",
"157 0.005739\n",
"418 0.004633"
]
}
],
"prompt_number": 24
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"z = pd.DataFrame(clf.feature_importances_)\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 25
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"z_sorted = z.sort(columns = [0], axis=0, ascending = False)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 26
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"z_sorted.head(15)\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stderr",
"text": [
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n",
"/Users/user/my_anaconda/anaconda/python.app/Contents/lib/python2.7/site-packages/pandas/core/config.py:570: DeprecationWarning: height has been deprecated.\n",
"\n",
" warnings.warn(d.msg, DeprecationWarning)\n"
]
},
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>0</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>52 </th>\n",
" <td> 0.032558</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40 </th>\n",
" <td> 0.031021</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56 </th>\n",
" <td> 0.030414</td>\n",
" </tr>\n",
" <tr>\n",
" <th>558</th>\n",
" <td> 0.026724</td>\n",
" </tr>\n",
" <tr>\n",
" <th>559</th>\n",
" <td> 0.026085</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49 </th>\n",
" <td> 0.025362</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53 </th>\n",
" <td> 0.024608</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41 </th>\n",
" <td> 0.022748</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50 </th>\n",
" <td> 0.022173</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57 </th>\n",
" <td> 0.016317</td>\n",
" </tr>\n",
" <tr>\n",
" <th>381</th>\n",
" <td> 0.012155</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42 </th>\n",
" <td> 0.011127</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51 </th>\n",
" <td> 0.010070</td>\n",
" </tr>\n",
" <tr>\n",
" <th>389</th>\n",
" <td> 0.009222</td>\n",
" </tr>\n",
" <tr>\n",
" <th>201</th>\n",
" <td> 0.008905</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 27,
"text": [
" 0\n",
"52 0.032558\n",
"40 0.031021\n",
"56 0.030414\n",
"558 0.026724\n",
"559 0.026085\n",
"49 0.025362\n",
"53 0.024608\n",
"41 0.022748\n",
"50 0.022173\n",
"57 0.016317\n",
"381 0.012155\n",
"42 0.011127\n",
"51 0.010070\n",
"389 0.009222\n",
"201 0.008905"
]
}
],
"prompt_number": 27
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"num_important = 8\n",
"important_factors = (df.columns[z_sorted.index[0:num_important]].values)\n",
"important_factors\n"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 28,
"text": [
"array(['tGravityAcc-min()-X', 'tGravityAcc-mean()-X',\n",
" 'tGravityAcc-energy()-X', 'angle(X,gravityMean)',\n",
" 'angle(Y,gravityMean)', 'tGravityAcc-max()-X',\n",
" 'tGravityAcc-min()-Y', 'tGravityAcc-mean()-Y'], dtype=object)"
]
}
],
"prompt_number": 28
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.display import HTML\n",
"HTML(\"\"\"\n",
"<style> \n",
"\n",
"div.cell {\n",
" width: 940px;\n",
" margin-left: auto;\n",
" margin-right: auto;\n",
"}\n",
"\n",
".rendered_html {\n",
" font-size: 100%;\n",
"}\n",
"\n",
"</style>\"\"\")"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"\n",
"<style> \n",
"\n",
"div.cell {\n",
" width: 940px;\n",
" margin-left: auto;\n",
" margin-right: auto;\n",
"}\n",
"\n",
".rendered_html {\n",
" font-size: 100%;\n",
"}\n",
"\n",
"</style>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 29,
"text": [
"<IPython.core.display.HTML at 0x1078b9410>"
]
}
],
"prompt_number": 29
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" ## References\n",
"\n",
"1. Jorge L. Reyes-Ortiz, Davide Anguita, Alessandro Ghio, Luca Oneto. \n",
"Smartlab - Non Linear Complex Systems Laboratory \n",
"DITEN - Universit\u00c3 degli Studi di Genova, Genoa I-16145, Italy. \n",
"http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones\n",
"2. Random Forests\u201d by Leo Breiman and Adele Cutler. Page URL (accessed March 4, 2013): http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm \n",
"\n"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment