Created
August 29, 2016 11:54
-
-
Save AashishTiwari/5e468048c23d2cffc131666db3a9d425 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#Classifying Digits Data Set.\n", | |
"\n", | |
"###Notebook by [Aashish K Tiwari]\n", | |
"####[Persistent Systems Ltd]\n", | |
"#### Data Source: Digits dataset from SciKit Learn package" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##Table of contents\n", | |
"\n", | |
"\n", | |
"1. [Step 1: Loading Dataset](#Step-1:-loading-dataset)\n", | |
"\n", | |
"2. [Step 2: Cleansing](#Step-2:-Cleansing)\n", | |
"\n", | |
"3. [Step 3: Aanalysis](#Step-3:-Analysis)\n", | |
"\n", | |
"4. [Step 4: Classification](#Step-4:-Classification)\n", | |
"\n", | |
"5. [Step 5: Conclusion](#Step-5:-Conclusion)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##Step 1: Loading Dataset\n", | |
"\n", | |
"[[ go back to the top ]](#Table-of-contents)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We are using the digits data present in the scikit learn package" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 129, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"(1797L, 64L)\n" | |
] | |
} | |
], | |
"source": [ | |
"from sklearn.datasets import load_digits\n", | |
"digits = load_digits()\n", | |
"print(digits.data.shape)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 130, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(1797L, 8L, 8L)" | |
] | |
}, | |
"execution_count": 130, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"digits.images.shape" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 131, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"array([[ 0., 0., 5., 13., 9., 1., 0., 0.],\n", | |
" [ 0., 0., 13., 15., 10., 15., 5., 0.],\n", | |
" [ 0., 3., 15., 2., 0., 11., 8., 0.],\n", | |
" [ 0., 4., 12., 0., 0., 8., 8., 0.],\n", | |
" [ 0., 5., 8., 0., 0., 9., 8., 0.],\n", | |
" [ 0., 4., 11., 0., 1., 12., 7., 0.],\n", | |
" [ 0., 2., 14., 5., 10., 12., 0., 0.],\n", | |
" [ 0., 0., 6., 13., 10., 0., 0., 0.]])" | |
] | |
}, | |
"execution_count": 131, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"digits.images[0]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As we can see each digit is represented by 8*8 matrix " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 132, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[0 1 2 ..., 8 9 8]\n" | |
] | |
} | |
], | |
"source": [ | |
"print(digits.target)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##Step 2: Cleansing\n", | |
"\n", | |
"[[ go back to the top ]](#Table-of-contents)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"ToDo: Check to see missing data " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##Step 3: Analysis\n", | |
"\n", | |
"[[ go back to the top ]](#Table-of-contents)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"digits.images is the 2d array holding images which represent actual digits (0-9), digits.target is our labels for supervised learning, Problem Domain here is given the features as integers for Images of digits from 0-9, we have to predict which digit is represented by image.\n", | |
"Below is one example of single digit plotted" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 133, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAL8AAADDCAYAAADTCsC8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACsxJREFUeJzt3V2MXVUZxvH/05YGWrCNwVihkwwJaDAhTgk2DRQ6GCCF\n8HVhgiQGgwlXIrRGAnqh7RVXpjUx3si3ICQW20AQFLTTQNRKoQdaWgglbdMiFAwUA0RT7OvF2cVh\n+jHrdK+958ys55dMes6ZwztvmafrrLP3WmcrIjAr0bSJbsBsojj8ViyH34rl8FuxHH4r1oy6BST5\ncJH1tYjQkR6vHf6qeNLzVqxYwYoVK3L8yJ5r7tq1K7nm6tWrWbZsWdLPTtXpdBgaGhr3eSnPOeSp\np55i6dKlSc9N+fvAxP6OmqgnHTH3gKc9VjCH34rVaviHh4cnRc1FixZlrzlv3rzsNc8888zsNSfD\n7yhXPdVd3iApJsMSiV7m/Klyz42htzl/L1Ln/FONpKO+4fW0x4o1bvglLZX0qqTXJd3eRlNmbThm\n+CVNB34BLAW+Clwv6ew2GjNr2ngj/0JgR0TsiogDwCPANc23Zda88cJ/OrBn1P291WNmk954Z3iT\nDuOMPuoxPDzcyOEysxQjIyOMjIwkPXe88L8JDIy6P0B39P+MJg75mR2PsYPvypUrj/rc8aY9m4Cz\nJA1KmglcBzyWoUezCXfMkT8iPpF0M/AHYDpwd0Rsb6Uzs4aNu6ozIp4EnmyhF7NW+QyvFcvht2I5\n/FYsh9+KVcyS5sHBwew1d+/enb1mU+bMmZO9ZhPLxOfOnZu1npc0mx2Bw2/FcvitWA6/Fcvht2I5\n/FYsh9+KlbKB/R5J+yRtaaMhs7akjPz30t3AbjaljBv+iHgWeL+FXsxa5Tm/FSvLR5R7A7v1i142\nsCctbJM0CDweEecc4Xte2DYJeGHb4TztsWKlHOp8GPgL8GVJeyTd2HxbZs1L2cB+fRuNmLXN0x4r\nlsNvxXL4rVgOvxUry0mu3FJPUvSiiWPyq1atyl6zqROECxYsyF7zvvvuy16zzWuHeeS3Yjn8ViyH\n34rl8FuxHH4rlsNvxXL4rVgpqzoHJK2X9IqkrZJuaaMxs6alnOQ6ACyPiI6kk4EXJD3ta3PZZJey\ngf3tiOhUtz8EtgOnNd2YWdN6mvNX2xkXABubaMasTclre6opzxrg1uoV4FPewG79IucV2AGQdALw\nKPBgRKwb+31fgd36Rc4rsCNJwN3AtohYnaE/s76QMue/APg2cLGkzdWXP77QJr2UDezP4ZNhNgU5\n1FYsh9+K5fBbsRx+K1ZfbmDfv3//RLeQpNPpTHQLE2poaGiiW6jFI78Vy+G3Yjn8ViyH34rl8Fux\nHH4rVsqqzhMlbZTUkbRN0p1tNGbWtJSFbf+WdHFEfCxpBvCcpMXVgjezSStp2hMRH1c3ZwLTgfca\n68isJUnhlzRNUgfYB6yPiG3NtmXWvNSR/2BEDAHzgYskDTfalVkLelrbExEfSHoCOA8YOfS4N7Bb\nv8i6gV3SqcAnEbFf0knApcBndgV7A7v1i142sKeM/F8C7pc0je406dcR8aeaPZpNuJRDnVuAc1vo\nxaxVPsNrxXL4rVgOvxXL4bdiOfxWLIffitWXn95w7bXXZq+5du3a7DWXLVuWvWbq2UmrzyO/Fcvh\nt2I5/FYsh9+K5fBbsVJ3ck2vrsjyeNMNmbUldeS/FdgGRIO9mLUq5aNL5gNXAHcBarwjs5akjPyr\ngNuAgw33YtaqY4Zf0pXAOxGxGY/6NsWMt7zhfOBqSVcAJwKfk/RARNww+knewG79opcN7IpIew8r\naQnww4i4aszjkVpjIq1bd9iF42trYm1PU3bv3p295vr167PXzD1wSiIijjhr6fU4f/+n3CxR8qrO\niNgAbGiwF7NW+QyvFcvht2I5/FYsh9+K5fBbsRx+K1bySa6jFpgkJ7lKJ+VfnbJz587sNQcHB7PW\ny3mSy2zKcPitWA6/Fcvht2I5/FaspIVtknYB/wL+CxyIiIVNNmXWhtRVnQEMR4QvPm1TRi/THm9j\ntCklNfwBPCNpk6SbmmzIrC2p054LIuItSV8Anpb0akQ8e+ib3sNr/aKRPbyf/gfST4EPI+Jn1X0v\nb5gEvLzhcCkfWjVL0inV7dnAZcCWrB2aTYCUac8XgbXVyDEDeCgi/thoV2YtSLkC+05gqIVezFrl\nM7xWLIffiuXwW7EcfiuWw2/FcvitWH15BfYmNHFl806nk72mtccjvxXL4bdiOfxWLIffiuXwW7FS\nljTPlbRG0nZJ2yQtaqMxs6alHOr8OfD7iPimpBnA7IZ7MmvFMcMvaQ5wYUR8ByAiPgE+aKMxs6aN\nN+05A3hX0r2SXpT0K0mz2mjMrGnjTXtmAOcCN0fE85JWA3cAPxn9JG9gt36RbQO7pHnAXyPijOr+\nYuCOiLhy1HMmxQb20pc3LF++PHvNKb2BPSLeBvZI+nL10CXAK1m7M5sgKUd7vg88JGkm8AZwY7Mt\nmbUjZQP7S8DXW+jFrFU+w2vFcvitWA6/Fcvht2I5/FYsh9+KVcwG9v3792evuW7duuw1N2zYkL0m\nwJIlS7LXzH02tm0e+a1YDr8Vy+G3Yjn8ViyH34qVsoH9K5I2j/r6QNItbTRn1qSUVZ2vAQsAJE0D\n3gTWNtyXWeN6nfZcArwREXuaaMasTb2G/1vAb5poxKxtyWd4q51cVwG3j/2eN7Bbv+hlA3svyxsu\nB16IiHfHfmN0+M0m0tjBd+XKlUd9bi/TnuuBh4+7K7M+kxR+SbPpvtn9XbPtmLUnadoTER8Bpzbc\ni1mrfIbXiuXwW7FaDX8THxnYRM2tW7dmr9nEZpomNNFn7t9RrnoO/xE4/Hk5/GZ9xuG3Yh3zI8qT\nCkj9//nkVrSjfUR57fCbTVae9lixHH4rlsNvxWol/JKWSnpV0uuSDtsPcJw175G0T9KWHPWqmgOS\n1kt6RdLWunuVJZ0oaaOkTnUB7zsz9jq92lP9eKZ6uyS9XNX8e6aaWS9gnn0/eUQ0+gVMB3YAg8AJ\nQAc4O0PdC+nuLd6Ssdd5wFB1+2Tgtbq9ArOqP2cAfwMWZ+r1B8BDwGOZ6u0EPp/5d38/8N1Rf/85\nGWtPA94CBo63Rhsj/0JgR0TsiogDwCPANXWLRsSzwPt164yp+XZEdKrbHwLbgdNq1vy4ujmT7kDw\nXq0mAUnzgSuAu4AjHsY73tLZCv3/Aub3QPcC5hGR8wLmtfeTtxH+04HRDe6tHutrkgbpvrJsrFln\nmqQOsA9YHxHb6nfHKuA24GCGWocE8IykTZJuylCv6QuY195P3kb4J92JBEknA2uAW6tXgOMWEQcj\nYgiYD1wkabhmb1cC70TEZvKO+hdExAK621W/J+nCmvUOXcD8lxFxLvAR3QuY1zZqP/lv69RpI/xv\nAgOj7g/QHf37kqQTgEeBByMi22eQVy/5TwDn1Sx1PnC1pJ10t5V+Q9IDGfp7q/rzXbqfy7SwZsm9\nwN6IeL66v4buP4YcjrqfvBdthH8TcJakwepf7HXAYy383J5JEnA3sC0iVmeod6qkudXtk4BLgc11\nakbEjyNiICLOoPvS/+eIuKFmn7MknVLdng1cBtQ6ihbNXsA8z37ynO/uj/HO/HK6R052AD/KVPNh\n4B/Af+i+p7gxQ83FdOfRHboh3QwsrVHvHODFqt7LwG2Z/78uIcPRHrrz8071tTXj7+hrwPPAS3T3\nf9c+2gPMBv4JnFK3ltf2WLF8hteK5fBbsRx+K5bDb8Vy+K1YDr8Vy+G3Yv0P9+D2+GcOaowAAAAA\nSUVORK5CYII=\n", | |
"text/plain": [ | |
"<matplotlib.figure.Figure at 0x31147e48>" | |
] | |
}, | |
"metadata": {}, | |
"output_type": "display_data" | |
} | |
], | |
"source": [ | |
"import matplotlib.pyplot as plt\n", | |
"%matplotlib inline\n", | |
"plt.figure(1, figsize=(3, 3))\n", | |
"plt.imshow(digits.images[-2], cmap=plt.cm.gray_r, interpolation='nearest')\n", | |
"plt.show()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Uncomment and run below cell to see details about the dataset" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 134, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"#digits" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 135, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1797" | |
] | |
}, | |
"execution_count": 135, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(digits.images)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"### Reshape data to feed to classifier" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 136, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"(1797L, 64L)" | |
] | |
}, | |
"execution_count": 136, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"data = digits.images.reshape((1797, -1))\n", | |
"data.shape" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##Step 4: Classification\n", | |
"\n", | |
"[[ go back to the top ]](#Table-of-contents)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 137, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from sklearn.cross_validation import train_test_split\n", | |
"X = data\n", | |
"y = digits.target\n", | |
"\n", | |
"X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Support Vector Classifier" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"We are using gamma param as 0.001 which is not the default provided by Scikit" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 138, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.9907407407407407" | |
] | |
}, | |
"execution_count": 138, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"from sklearn import svm\n", | |
"svc_classifier = svm.SVC(gamma=0.001)\n", | |
"svc_classifier.fit(X_train, y_train)\n", | |
"standard_svc = svc_classifier.score(X_test, y_test)\n", | |
"standard_svc" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 139, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3,\n", | |
" gamma=0.001, kernel='rbf', max_iter=-1, probability=False,\n", | |
" random_state=None, shrinking=True, tol=0.001, verbose=False)" | |
] | |
}, | |
"execution_count": 139, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"svc_classifier" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 140, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" precision recall f1-score support\n", | |
"\n", | |
" 0 1.00 1.00 1.00 178\n", | |
" 1 1.00 1.00 1.00 182\n", | |
" 2 1.00 1.00 1.00 177\n", | |
" 3 0.99 1.00 1.00 183\n", | |
" 4 1.00 1.00 1.00 181\n", | |
" 5 0.99 0.99 0.99 182\n", | |
" 6 1.00 1.00 1.00 181\n", | |
" 7 1.00 0.99 1.00 179\n", | |
" 8 1.00 0.99 1.00 174\n", | |
" 9 0.98 0.98 0.98 180\n", | |
"\n", | |
"avg / total 1.00 1.00 1.00 1797\n", | |
"\n", | |
"[[178 0 0 0 0 0 0 0 0 0]\n", | |
" [ 0 182 0 0 0 0 0 0 0 0]\n", | |
" [ 0 0 177 0 0 0 0 0 0 0]\n", | |
" [ 0 0 0 183 0 0 0 0 0 0]\n", | |
" [ 0 0 0 0 181 0 0 0 0 0]\n", | |
" [ 0 0 0 0 0 180 0 0 0 2]\n", | |
" [ 0 0 0 0 0 0 181 0 0 0]\n", | |
" [ 0 0 0 0 0 0 0 178 0 1]\n", | |
" [ 0 0 0 0 0 0 0 0 173 1]\n", | |
" [ 0 0 0 1 0 2 0 0 0 177]]\n" | |
] | |
} | |
], | |
"source": [ | |
"expected_svc = y\n", | |
"predicted_svc = svc_classifier.predict(X)\n", | |
"from sklearn import metrics\n", | |
"\n", | |
"# summarize the fit of the model\n", | |
"print(metrics.classification_report(expected_svc, predicted_svc))\n", | |
"print(metrics.confusion_matrix(expected_svc, predicted_svc))\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## K-Nearest Neighbour" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 141, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.9907407407407407" | |
] | |
}, | |
"execution_count": 141, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"from sklearn.neighbors import KNeighborsClassifier\n", | |
"knn_classifier = KNeighborsClassifier()\n", | |
"knn_classifier.fit(X_train,y_train)\n", | |
"knn_score = knn_classifier.score(X_test, y_test)\n", | |
"knn_score" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 142, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", | |
" metric_params=None, n_neighbors=5, p=2, weights='uniform')" | |
] | |
}, | |
"execution_count": 142, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"knn_classifier" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 143, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Best score: 0.977740678909\n", | |
"Best parameters: {'n_neighbors': 3, 'metric': 'euclidean'}\n" | |
] | |
} | |
], | |
"source": [ | |
"from sklearn.grid_search import GridSearchCV\n", | |
"from sklearn.neighbors import KNeighborsClassifier\n", | |
"from sklearn.cross_validation import StratifiedKFold\n", | |
"\n", | |
"param_range = [1,2,3,4,5,6,7,8,9,10]\n", | |
"\n", | |
"knn_classifier = KNeighborsClassifier()\n", | |
"parameter_grid = [{'n_neighbors': param_range, \n", | |
" 'metric': ['euclidean']},\n", | |
" {'n_neighbors': param_range, \n", | |
" 'metric': ['manhattan']},\n", | |
" {'n_neighbors': param_range, \n", | |
" 'metric': ['minkowski']}]\n", | |
"\n", | |
"\n", | |
"cross_validation = StratifiedKFold(y, n_folds=10)\n", | |
"\n", | |
"grid_search = GridSearchCV(knn_classifier,\n", | |
" param_grid=parameter_grid,\n", | |
" cv=cross_validation)\n", | |
"\n", | |
"\n", | |
"grid_search.fit(X, y)\n", | |
"print('Best score: {}'.format(grid_search.best_score_))\n", | |
"print('Best parameters: {}'.format(grid_search.best_params_))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 144, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# knn_classifier.fit(X,y)\n", | |
"# knn_classifier.score(X, y)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 145, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n", | |
" metric_params=None, n_neighbors=5, p=2, weights='uniform')" | |
] | |
}, | |
"execution_count": 145, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"knn_classifier" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Naive Bayes" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 146, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.79259259259259263" | |
] | |
}, | |
"execution_count": 146, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"from sklearn.naive_bayes import GaussianNB\n", | |
"gaussian_nb_classifier = GaussianNB()\n", | |
"gaussian_nb_classifier.fit(X_train,y_train)\n", | |
"gaussian_nb_score = gaussian_nb_classifier.score(X_test, y_test)\n", | |
"gaussian_nb_score" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 147, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"GaussianNB()" | |
] | |
}, | |
"execution_count": 147, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"gaussian_nb_classifier" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Decision Tree Classifier" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 148, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.8574074074074074" | |
] | |
}, | |
"execution_count": 148, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"from sklearn.tree import DecisionTreeClassifier\n", | |
"decision_tree_classifier = DecisionTreeClassifier(max_depth=10)\n", | |
"decision_tree_classifier.fit(X_train,y_train)\n", | |
"dct_score = decision_tree_classifier.score(X_test, y_test)\n", | |
"dct_score" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 149, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=10,\n", | |
" max_features=None, max_leaf_nodes=None, min_samples_leaf=1,\n", | |
" min_samples_split=2, min_weight_fraction_leaf=0.0,\n", | |
" random_state=None, splitter='best')" | |
] | |
}, | |
"execution_count": 149, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"decision_tree_classifier" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### ToDo: Ensemble Learning" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"##Step 5: Conclusion\n", | |
"\n", | |
"[[ go back to the top ]](#Table-of-contents)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 150, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"class ListTable(list):\n", | |
" \"\"\" Overridden list class which takes a 2-dimensional list of \n", | |
" the form [[1,2,3],[4,5,6]], and renders an HTML Table in \n", | |
" IPython Notebook. \"\"\"\n", | |
" \n", | |
" def _repr_html_(self):\n", | |
" html = [\"<table>\"]\n", | |
" for row in self:\n", | |
" html.append(\"<tr>\")\n", | |
" \n", | |
" for col in row:\n", | |
" html.append(\"<td>{0}</td>\".format(col))\n", | |
" \n", | |
" html.append(\"</tr>\")\n", | |
" html.append(\"</table>\")\n", | |
" return ''.join(html)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 152, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<table><tr><td></td><td>Support Vector Classifier</td><td>K-Nearest Neighbours</td><td>Decision Trees</td><td>Naive Bayes</td><td>Ensemble</td></tr><tr><td>Model f1 Scores</td><td>0.990740740741</td><td>0.990740740741</td><td>0.857407407407</td><td>0.792592592593</td><td></td></tr><tr><td>Model Params</td><td>gamma=0.001</td><td>minkowski, neighbors=5</td><td>gini, max_depth 10</td><td>Gaussian</td><td></td></tr></table>" | |
], | |
"text/plain": [ | |
"[['',\n", | |
" 'Support Vector Classifier',\n", | |
" 'K-Nearest Neighbours',\n", | |
" 'Decision Trees',\n", | |
" 'Naive Bayes',\n", | |
" 'Ensemble'],\n", | |
" ['Model f1 Scores',\n", | |
" 0.9907407407407407,\n", | |
" 0.9907407407407407,\n", | |
" 0.8574074074074074,\n", | |
" 0.79259259259259263,\n", | |
" ''],\n", | |
" ['Model Params',\n", | |
" 'gamma=0.001',\n", | |
" 'minkowski, neighbors=5',\n", | |
" 'gini, max_depth 10',\n", | |
" 'Gaussian',\n", | |
" '']]" | |
] | |
}, | |
"execution_count": 152, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"table = ListTable()\n", | |
"table.append(['', 'Support Vector Classifier', 'K-Nearest Neighbours', 'Decision Trees', 'Naive Bayes', 'Ensemble'])\n", | |
"table.append(['Model f1 Scores', standard_svc, knn_score, dct_score, gaussian_nb_score, ''])\n", | |
"table.append(['Model Params', 'gamma=0.001', 'minkowski, neighbors=5', 'gini, max_depth 10', 'Gaussian', ''])\n", | |
"table" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.10" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment