Skip to content

Instantly share code, notes, and snippets.

@depindersharma
Created February 2, 2020 10:36
Show Gist options
  • Save depindersharma/0d813b4b036245bc6b4383ddf57aec20 to your computer and use it in GitHub Desktop.
Save depindersharma/0d813b4b036245bc6b4383ddf57aec20 to your computer and use it in GitHub Desktop.
Created on Cognitive Class Labs
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a href=\"https://www.bigdatauniversity.com\"><img src=\"https://ibm.box.com/shared/static/cw2c7r3o20w9zn8gkecaeyjhgw3xdgbj.png\" width=\"400\" align=\"center\"></a>\n",
"\n",
"<h1 align=center><font size=\"5\"> SVM (Support Vector Machines)</font></h1>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, you will use SVM (Support Vector Machines) to build and train a model using human cell records, and classify cells to whether the samples are benign or malignant.\n",
"\n",
"SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable. A separator between the categories is found, then the data is transformed in such a way that the separator could be drawn as a hyperplane. Following this, characteristics of new data can be used to predict the group to which a new record should belong."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h1>Table of contents</h1>\n",
"\n",
"<div class=\"alert alert-block alert-info\" style=\"margin-top: 20px\">\n",
" <ol>\n",
" <li><a href=\"#load_dataset\">Load the Cancer data</a></li>\n",
" <li><a href=\"#modeling\">Modeling</a></li>\n",
" <li><a href=\"#evaluation\">Evaluation</a></li>\n",
" <li><a href=\"#practice\">Practice</a></li>\n",
" </ol>\n",
"</div>\n",
"<br>\n",
"<hr>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import pylab as pl\n",
"import numpy as np\n",
"import scipy.optimize as opt\n",
"from sklearn import preprocessing\n",
"from sklearn.model_selection import train_test_split\n",
"%matplotlib inline \n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"<h2 id=\"load_dataset\">Load the Cancer data</h2>\n",
"The example is based on a dataset that is publicly available from the UCI Machine Learning Repository (Asuncion and Newman, 2007)[http://mlearn.ics.uci.edu/MLRepository.html]. The dataset consists of several hundred human cell sample records, each of which contains the values of a set of cell characteristics. The fields in each record are:\n",
"\n",
"|Field name|Description|\n",
"|--- |--- |\n",
"|ID|Clump thickness|\n",
"|Clump|Clump thickness|\n",
"|UnifSize|Uniformity of cell size|\n",
"|UnifShape|Uniformity of cell shape|\n",
"|MargAdh|Marginal adhesion|\n",
"|SingEpiSize|Single epithelial cell size|\n",
"|BareNuc|Bare nuclei|\n",
"|BlandChrom|Bland chromatin|\n",
"|NormNucl|Normal nucleoli|\n",
"|Mit|Mitoses|\n",
"|Class|Benign or malignant|\n",
"\n",
"<br>\n",
"<br>\n",
"\n",
"For the purposes of this example, we're using a dataset that has a relatively small number of predictors in each record. To download the data, we will use `!wget` to download it from IBM Object Storage. \n",
"__Did you know?__ When it comes to Machine Learning, you will likely be working with large datasets. As a business, where can you host your data? IBM is offering a unique opportunity for businesses, with 10 Tb of IBM Cloud Object Storage: [Sign up now for free](http://cocl.us/ML0101EN-IBM-Offer-CC)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2020-02-01 19:06:32-- https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/cell_samples.csv\n",
"Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196\n",
"Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 20675 (20K) [text/csv]\n",
"Saving to: ‘cell_samples.csv’\n",
"\n",
"cell_samples.csv 100%[===================>] 20.19K --.-KB/s in 0.02s \n",
"\n",
"2020-02-01 19:06:32 (954 KB/s) - ‘cell_samples.csv’ saved [20675/20675]\n",
"\n"
]
}
],
"source": [
"#Click here and press Shift+Enter\n",
"!wget -O cell_samples.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/cell_samples.csv"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"### Load Data From CSV File "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Clump</th>\n",
" <th>UnifSize</th>\n",
" <th>UnifShape</th>\n",
" <th>MargAdh</th>\n",
" <th>SingEpiSize</th>\n",
" <th>BareNuc</th>\n",
" <th>BlandChrom</th>\n",
" <th>NormNucl</th>\n",
" <th>Mit</th>\n",
" <th>Class</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1000025</td>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1002945</td>\n",
" <td>5</td>\n",
" <td>4</td>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>10</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1015425</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1016277</td>\n",
" <td>6</td>\n",
" <td>8</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>3</td>\n",
" <td>7</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1017023</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Clump UnifSize UnifShape MargAdh SingEpiSize BareNuc \\\n",
"0 1000025 5 1 1 1 2 1 \n",
"1 1002945 5 4 4 5 7 10 \n",
"2 1015425 3 1 1 1 2 2 \n",
"3 1016277 6 8 8 1 3 4 \n",
"4 1017023 4 1 1 3 2 1 \n",
"\n",
" BlandChrom NormNucl Mit Class \n",
"0 3 1 1 2 \n",
"1 3 2 1 2 \n",
"2 3 1 1 2 \n",
"3 3 7 1 2 \n",
"4 3 1 1 2 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cell_df = pd.read_csv(\"cell_samples.csv\")\n",
"cell_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ID field contains the patient identifiers. The characteristics of the cell samples from each patient are contained in fields Clump to Mit. The values are graded from 1 to 10, with 1 being the closest to benign.\n",
"\n",
"The Class field contains the diagnosis, as confirmed by separate medical procedures, as to whether the samples are benign (value = 2) or malignant (value = 4).\n",
"\n",
"Lets look at the distribution of the classes based on Clump thickness and Uniformity of cell size:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8QZhcZAAAgAElEQVR4nO3dfXRU9b3v8fcXkpSJkmgh9nLEm6G9UsODRohZHKFHVJCuK1Xrsr2leq/SKF2tQVtrq21Xfeg6p8vj8bb2aG/vpY3IaUu0Bx9LfUB6dFWtbQhCFTIHrHVATrmHIXpzqolNQr73j5k8AoY8zN472Z/XWqw988tk7y+/2fnkl9/M7J+5OyIiEh8Twi5ARESCpeAXEYkZBb+ISMwo+EVEYkbBLyISMwVhF3Aspk6d6slkMuwyRETGlK1btx5097KB7WMi+JPJJI2NjWGXISIyppjZniO1a6pHRCRmFPwiIjGj4BcRiZkxMcd/JB0dHezbt4/33nsv7FLGlUmTJjF9+nQKCwvDLkVE8mTMBv++ffuYPHkyyWQSMwu7nHHB3Wlubmbfvn3MmDEj7HJEJE/yNtVjZveZ2QEz29Gn7YNm9oyZvZbbnjjc/b/33ntMmTJFoT+KzIwpU6bE5q+oTKaVLVv2k8m0hlpHKtXMunU7SKWaQ60jCqLSF1E4NzZufJ2rr36ajRtfH/V953PEfz9wL/BPfdpuBn7l7neY2c25+zcN9wAK/dEXlz6tr09RU/M0RUUTaG/voq5uGStWVARex+rVm7n33u0992trK7nnniWB1xEFUemLKJwbc+euZceO7C+/urpXmTt3Cq+8snLU9p+3Eb+7/xp4a0DzxcC63O11wCX5Or7I0WQyrdTUPE1bWyctLe20tXVSU/N04KO7VKq5X9AB3Hvv9tBHu2GISl9E4dzYuPH1ntDv9uqrzaM68g/6XT0fcvf9ALntSUd7oJmtMrNGM2vMZDKBFRiU5557juXLlwPw+OOPc8cddwR27O3bt/PEE08EdryoSadbKCrqf+oXFk4gnW4JtI6Ghv1Dah/PotIXUTg3Hn30D0NqH47Ivp3T3de4e5W7V5WVHfaJ43Hloosu4uabbw7seHEP/mSylPb2rn5tHR1dJJOlgdZRXT1tSO3jWVT6IgrnxiWX/JchtQ9H0MH/72Y2DSC3PRDkwUf7BZt0Os1pp53G1VdfzZw5c7j88svZvHkzCxcu5NRTT6WhoYGGhgbOPvtszjzzTM4++2x27dp12H7uv/9+amtrAXj99ddZsGABZ511FrfccgvHH388kP0LYfHixVx22WWcdtppXH755XSvnvbtb3+bs846izlz5rBq1aqe9sWLF3PTTTdRXV3NzJkzef7552lvb+eWW27hwQcfpLKykgcffHBU+mIsKSsrpq5uGYlEASUlRSQSBdTVLaOsrDjQOioqplBbW9mvrba2koqKKYHWEQVR6YsonBvLl3+EuXP7/7/nzp3C8uUfGb2DuHve/gFJYEef+/8A3Jy7fTNw57HsZ/78+T5QU1PTYW3vZ/36Jk8kvuelpd/3ROJ7vn790L7/SN544w2fOHGiv/LKK37o0CGfN2+er1y50ru6uvzRRx/1iy++2FtaWryjo8Pd3Z955hm/9NJL3d392Wef9QsvvNDd3deuXevXXnutu7tfeOGFvn79end3/+EPf+jHHXdcz+NLSkr8zTff9EOHDvmCBQv8+eefd3f35ubmnpquuOIKf/zxx93d/ZxzzvEbbrjB3d1/+ctf+vnnn3/Y8Y5kqH07Vh048K43NPzJDxx4N9Q6mpoO+v33v+pNTQdDrSMKotIXUTg3fvGLP3hNzVP+i1/8Ydj7ABr9CJmat3f1mFk9sBiYamb7gFuBO4Cfm1kNsBf4VL6O31ffF2za2rJtNTVPs2RJ+Yh/k8+YMYO5c+cCMHv2bM4//3zMjLlz55JOp2lpaeHKK6/ktddew8zo6Oh43/299NJLPProowB89rOf5cYbb+z5WnV1NdOnTwegsrKSdDrNokWLePbZZ7nzzjtpbW3lrbfeYvbs2XziE58A4NJLLwVg/vz5pNPpEf1fx5uysuLAR/lHUlExJZaj/COJSl9E4dxYvvwjozvK7yNvwe/uK47ypfPzdcyj6X7Bpjv0ofcFm5E+uR/4wAd6bk+YMKHn/oQJE+js7ORb3/oW5557Lo888gjpdJrFixePyrEmTpxIZ2cn7733Hl/84hdpbGzklFNO4bbbbuv3Pvzu7+l+vIhIZF/cHU1hvmDT0tLCySefDGTn8gezYMECHnroIQAeeOCBQR/fHfJTp07lnXfeYcOGDYN+z+TJk/nzn/886ONEZHyKRfCH+YLN1772Nb7+9a+zcOFCDh06NOjj7777br773e9SXV3N/v37KS19/19OJ5xwAtdccw1z587lkksu4ayzzhr0GOeeey5NTU2xfXFXJO7Mc+8AibKqqiofuBBLKpWiomJon6bLZFpJp1tIJktDn787mtbWVhKJBGbGAw88QH19PY899ligNQynb0Ukesxsq7tXDWwfsxdpG44ovGAzmK1bt1JbW4u7c8IJJ3DfffeFXZKIjDOxCv6x4GMf+xi///3vwy5DRMaxWMzxi4hILwW/iEjMKPhFRGJGwS8iEjMK/hFIp9PMmTNnxPtpbGzkuuuuG4WKREQGp3f1REBVVRVVVYe91VZEJC9iNuLPAFty29HR2dnJlVdeyemnn85ll11Ga2srW7du5ZxzzmH+/PksW7aM/fuzi0kc6TLJ0H9Rlkwmw9KlS5k3bx6f//znKS8v5+DBg6TTaSoqKrjmmmuYPXs2F1xwAW19Lz4kInKMYhT89UA5sDS3rR+Vve7atYtVq1bxyiuvUFJSwg9+8ANWr17Nhg0b2Lp1K5/73Of45je/2fP4zs5OGhoauPvuu7n99tsP29/tt9/Oeeedx8svv8wnP/lJ9u7d2/O11157jWuvvZadO3dywgkn9FzTR0RkKGIy1ZMBaoC23D9y95cAI1vd65RTTmHhwoUAXHHFFXznO99hx44dLF26FIBDhw4xbVrvKkKDXSb5hRde4JFHHgHg4x//OCeeeGLP12bMmEFlZeX7fr+IyGBiEvxpoIje0AcozLWPLPjNrN/9yZMnM3v2bF566aUjPn6wyyS/37WTBl6WWVM9IjIcMZnqSQLtA9o6cu0js3fv3p6Qr6+vZ8GCBWQymZ62jo4Odu7cecz7W7RoET//+c8B2LRpE2+//faIaxQR6SsmwV8G1AEJoCS3rWOko32AiooK1q1bx+mnn85bb73VM79/0003ccYZZ1BZWclvfvObY97frbfeyqZNm5g3bx5PPvkk06ZNY/LkySOuU0SkW6wuy5yd60+THemPPPTz4S9/+QsTJ06koKCAl156iS984Qts37490Bp0WWaR8UGXZQayYR/NwO+2d+9ePv3pT9PV1UVRURE/+tGPwi5JRMaZmAV/9J166qls27Yt7DJEZBwb03P8Y2GaaqxRn4qMf2M2+CdNmkRzc7OCahS5O83NzUyaNCnsUkQkj8bsVM/06dPZt28fmczoXX5Bsr9Qp0+fHnYZIpJHYzb4CwsLmTFjRthliIiMOWN2qkdERIZHwS8iEjMKfhGRmFHwi4jEjIJfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURiRsEvIhIzCn4RkZgJJfjN7MtmttPMdphZvZnpcpASYxlgS24bYhWZVrZs2U8m0xpqHZJ/gQe/mZ0MXAdUufscYCLwmaDrEImGeqAcWJrb1odTRX2K8vI1LF36z5SXr6G+PhVKHRKMsKZ6CoCEmRUAxcCfQqpDJEQZoAZoA1py2xqCHvlnMq3U1DxNW1snLS3ttLV1UlPztEb+41jgwe/u/wbcBewF9gMt7r5p4OPMbJWZNZpZo665L+NTGiga0FaYaw+winQLRUX9o6CwcALpdEugdUhwwpjqORG4GJgB/BVwnJldMfBx7r7G3avcvaqsLNoLpIsMTxJoH9DWkWsPsIpkKe3tXf2r6OgimSwNtA4JThhTPUuAN9w94+4dwMPA2SHUIRKyMqAOSAAluW1drj3AKsqKqatbRiJRQElJEYlEAXV1yygrKw60DglOGCtw7QUWmFkx2UnN84HGEOoQiYAVZMdCabIj/XD+ul2xooIlS8pJp1tIJksV+uNc4MHv7r8zsw3Ay0AnsA1YE3QdItFRRliB36+KsmIFfkyEsuauu98K3BrGsUVE4k6f3BURiRkFv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxIyCX0QkZhT8IiIxo+AXEYkZBb+ISMwo+EVEYkbBLyISMwp+EZGYUfDHTCbTypYt+7WeqkSSzs9eqVQz69btIJVqHvV9h3JZZglHfX2KmpqnKSqaQHt7F3V1y1ixoiLsskQAnZ99rV69mXvv3d5zv7a2knvuWTJq+zd3H7Wd5UtVVZU3NmqRrpHIZFopL19DW1tnT1siUcCePau0+IaETudnr1SqmVmz1h7W3tS0koqKKUPal5ltdfeqge2a6omJdLqFoqL+T3dh4QTS6ZaQKhLppfOzV0PD/iG1D4eCPyaSyVLa27v6tXV0dJFMloZUkUgvnZ+9qqunDal9OBT8MVFWVkxd3TISiQJKSopIJAqoq1sWuz+jJZp0fvaqqJhCbW1lv7ba2sohT/O8H83xx0wm00o63UIyWRrLHyqJNp2fvVKpZhoa9lNdPW3YoX+0OX69qydmysqKY/8DJdGl87NXRcWUUR3l96WpHhGRmFHwi4jEjIJfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxMyQgt/MjstXISIiEoxjCn4zO9vMmoBU7v4ZZva/8lqZiIjkxbGO+L8HLAOaAdz998Df5KsoERHJn2Oe6nH3Nwc0HRruQc3sBDPbYGb/amYpM/vr4e5LhkaLWfeKTl9kgC25rURBFM6NKCy2/qaZnQ24mRUB15Gb9hmm7wNPuftluf3pOqwB0GLWvaLTF/VADVAEtAN1wIoQ6pBuUTg3IrHYuplNJRvWSwADNgHXu/uQfxWZWQnwe+DDfoyrwGghlpHTYta9otMXGaAcaOvTlgD2AGUB1iHdonBuRGmx9YS7X+7uH3L3k9z9CqBwSBX0+jDZM36tmW0zsx8f6d1CZrbKzBrNrDGT0Z/AI6XFrHtFpy/SZEf6/SrJtUsYonBuRGmx9TfMrN7MEn3anhjmMQuAecAP3f1M4F3g5oEPcvc17l7l7lVlZRr9jJQWs+4Vnb5Ikp3e6VdJrl3CEIVzI0qLrb8KPA+8YGYfybXZMI+5D9jn7r/L3d9A9heB5JEWs+4Vnb4oIzunnwBKcts6NM0TniicG5FZbN3MXnb3eWa2EPgRcBNwu7sPK7DN7HnganffZWa3Ace5+1eP9njN8Y8eLWbdKzp9kSE7vZNEoR8NUTg38rnY+rEG/7bctAxmNg14EKhy92H1iJlVAj8mO8H5R2Clu799tMcr+EVEhu5owX+sb+f8r9033H2/mZ0HnD3cYtx9O3BYMSIikn/vG/xmdoW7/xRYYXbEKf1f56UqERHJm8FG/N1vs5yc70JERCQY7xv87v5/ctvbgylHRETy7X3fzmlm15jZqbnbZmb3mVmLmb1iZmcGU6KIiIymwd7Hfz29HyNcAZxB9pO3NwD/mL+yREQkXwYL/k5378jdXg78k7s3u/tmeuf/RURkDBks+LvMbJqZTQLOBzb3+VriKN8jIiIRNti7er4FNAITgcfdfSeAmZ1D9oNXIiIyxgwW/MVkrxt7uru/3Ke9EfhveatKRETyZrCpnq+7eyfZyyv0cPd33f2d/JUlIiL5MtiIv9nMngVmmNnjA7/o7hflpywREcmXwYL/QrKXTP4J8D/zX46IiOTbYJ/cbQd+a2Znu7uWwRIRGQcGu0jb3e7+JeA+Mzvs+s2a6hmaKFzjOwo1RIX6oj/1R6/RuBZ+lGsYbKrnJ7ntXaN61Biqr09RU/M0RUUTaG/voq5uGStWVMSuhqhQX/Sn/ui1evVm7r13e8/92tpK7rlnybiq4ZgWYgnbWF+IJZNppbx8DW1tnT1tiUQBe/asCmxkFYUaokJ90Z/6o1cq1cysWWsPa29qWhnYyH80azjaQizHtOaumS00s2fMbLeZ/dHM3jAzfYDrGKXTLRQV9e/qwsIJpNMtsaohKtQX/ak/ejU07B9S+1it4VhX4KoDvgxsBQ6N2tFjIpkspb29q19bR0cXyWRprGqICvVFf+qPXtXV04bUPlZrOKYRP9Di7k+6+4HcRdqa3b151KoY58rKiqmrW0YiUUBJSRGJRAF1dcsC/TM6CjVEhfqiP/VHr4qKKdTWVvZrq62tDPQF3iBqONbF1u8ge72eh4G/dLcPuIxD3oz1Of5uUXjXRBRqiAr1RX/qj17j5V09R5vjP9bgfzZ3s/vBBri7nzesaoZovAS/iEiQjhb8g72P/4bczY25rQMZ4AV3f2N0SxQRkSAMNsc/Offv+Ny/yUAV8KSZfSbPtYmISB4MdsmGIy6ybmYfJLsoywP5KEpERPLnWN/V04+7v0V2nl9ERMaYYQW/mZ0HvD3KtYiISAAGe3H3VXrfydPtg8CfgP+Rr6JERCR/Bvvk7vIB9x1odvd381SPiIjk2WAv7u4JqhAREQnGsOb4RURk7FLwi4jEjIJfRCRmFPwiIjGj4BcRiRkFv4hIzIQW/GY20cy2mdnGwR8tMvoOHtxDU9OTHDyody1D9nr8W7bsJ5NpjXUNUakjlWpm3bodpFKjv+ZVmCP+64FUiMeXGHvxxbsoLj6Vk0++lOLiU3nxxbvCLilU9fUpysvXsHTpP1Nevob6+uB/NKNQQ1TqWL16M7NmreWqq55i1qy1rF69eVT3f0wLsYw2M5sOrAP+DrjB3Qd+QrgfLcQio+ngwT0UF59KcXFHT1trayGtra8xdWp5iJWFI5Nppbx8DW1tnT1tiUQBe/asCmwlrijUEJU6UqlmZs1ae1h7U9PKIa/EdbSFWMIa8d8NfA3oOtoDzGyVmTWaWWMmkwmuMhn3DhxooqNjYr+2jo6JHDjQFFJF4UqnWygq6h8FhYUTSKdbYlVDVOpoaNg/pPbhCDz4zWw5cMDdt77f49x9jbtXuXtVWVlZQNVJHJx00iwKCw/1ayssPMRJJ80KqaJwJZOltLf3H4N1dHSRTJbGqoao1FFdPW1I7cMRxoh/IXCRmaXJLuRynpn9NIQ6JKamTi1n27bv0NpaSEvLJFpbC9m27TuxnOYBKCsrpq5uGYlEASUlRSQSBdTVLQt0iiUKNUSljoqKKdTWVvZrq62tHNVF30OZ4+85uNli4EbN8UsYDh7cw4EDTZx00qzYhn5fmUwr6XQLyWRp4IEbpRqiUkcq1UxDw36qq6cNO/SHtdi6yHg2dWq5Ar+PsrLiUMM2KjVEpY6KiimjOsrvK9Tgd/fngOfCrEFEJG70yV0RkZhR8IuIxIyCX0QkZhT8IiIxo+AXEYkZBb+ISMwo+EVEYkbBLyISMwp+EZGYUfCLiMSMgl9EJGYU/CIiMaPgFxGJmZgEfwbYktuGWEWmlS1b9pPJtMa6hqjYtKmBv/3bf2TTpoZQ64jKc5JKNbNu3Q5SqebQaohKX0RBXp8Pd4/8v/nz5/vwrXf3hLuX5rbrR7CvEVSxvskTie95aen3PZH4nq9f3xTLGqLiq19d5e++W+hvvz3J33230L/61VWh1BGV56S29hmHf+j5V1v7TOA1RKUvomC0ng+g0Y+QqaGuwHWshr8CVwYoB9r6tCWAPUBw6/hmMq2Ul6+hra2zt4pEAXv2rApssYco1BAVmzY1sGjRIoqLO3raWlsLeeGFF7jggurA6ojKc5JKNTNr1trD2puaVuZtIZCBotIXUTCaz8fRVuAa51M9aaBoQFthrj3AKtItFBX17+rCwgmk0y2xqiEqGhp+S3v7xH5tHR0TaGj4baB1ROU5aWjYP6T2fIhKX0RBEM/HOA/+JNA+oK0j1x5gFclS2tu7+lfR0UUyWRqrGqKiunoBRUWH+rUVFnZRXb0g0Dqi8pxUV08bUns+RKUvoiCI52OcB38ZUEd2eqckt60jyGkeyK7fWVe3jESigJKSIhKJAurqlgX6J2wUaoiKCy6o5rbbVtLaWkhLywdobS3ktttWBjrNA9F5TioqplBbW9mvrba2MrBpHohOX0RBEM/HOJ/j75YhO72TJOjQ71dFppV0uoVksjS0EzoKNUTFpk0NNDT8lurqBYGHfl9ReU5SqWYaGvZTXT0t0NDvKyp9EQWj8XwcbY4/JsEvIhI/MX1xV0REBlLwi4jEjIJfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURiRsEvIhIzCn4RkZhR8IuIxIyCX0QkZhT8IiIxo+AXEYmZwIPfzE4xs2fNLGVmO83s+qBrCE8UFn2PQg3RqGP37l1s3PgAu3fvCq2GrPD7IiqisOB7HIQx4u8EvuLuFcAC4FozmxVCHQGrJ7v+79Lctj6mNUSjjrVrv8H06XNZtGgl06fPZe3abwReQ1b4fREVq1dvZtastVx11VPMmrWW1as3h13SuBX69fjN7DHgXnd/5miPGfvX44/Cou9RqCEadezevYvp0+cettj6vn2vMnPmRwOpISv8voiKKCz4Ph5F8nr8ZpYEzgR+d4SvrTKzRjNrzGTG+p/AacJf9D0KNUSjjt27tx1xsfXdu7cFVkNWmrD7IiqisOB7nIQW/GZ2PPAQ8CV3/4+BX3f3Ne5e5e5VZWVjffSTJPxF36NQQzTqmDnzzCMutj5z5pmB1ZCVJOy+iIooLPgeJ6EEv5kVkg39n7n7w2HUEKwoLPoehRqiUcfMmR/lwQdv7LfY+oMP3hjwNA9EoS+iIgoLvsdJ4HP8ZmbAOuAtd//SsXzP2J/j7xaFRd+jUEM06ti9exe7d29j5swzQwj9vsLvi6iIwoLv40lkFls3s0XA88CrQFeu+Rvu/sTRvmf8BL+ISHCOFvwFQRfi7i8AFvRxRUQkS5/cFRGJGQW/iEjMKPhFRGJGwS8iEjMKfhGRmFHwi4jEjIJfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURiRsEvIhIzCn4RkZiJSfBfAhyf24bpK2TXWP1KiDXcSXa1yztDrCEqdbwI3JrbhikDbMltRfIv9MXWj8XIrsd/pCtAh/F/nkjv8gPd9zsDruE4oHXA/XcCriEqdVwAPDPg/tMB1wBQD9SQXXu3newKXCtCqEPGo0gutp5/RxvhBz3y/wr9Qx/gEMGO/O+kf9gCvEvwI+4o1PEi/UMfYBPBj/wzZEO/DWjJbWvQyF/ybZwH/+YhtufLhiG250P9ENvzJQp1bBpie76kyY70+yrMtYvkzzgP/iVDbM+Xy4bYng9Hmz4IelohCnVcMMT2fEmSnd7pqyPXLpI/muMPTAHZ6Z1uYczxH092WqVbWHP8UahjGf1H+GHP8ReSDX3N8cvoiekcP2RD/mKy4XIx4YQ+ZEP+BuA/57ZBhz5kw/XvgcrcNozQj0odTwMvALfktmGEPmRDfg/Z6cc9KPQlCDEY8YuIxFOMR/wiItKXgl9EJGYU/CIiMaPgFxGJGQW/iEjMKPhFRGJGwS8iEjNj4n38ZpYh++mW8WAqcDDsIiJCfdFLfdFLfdFrpH1R7u5lAxvHRPCPJ2bWeKQPVMSR+qKX+qKX+qJXvvpCUz0iIjGj4BcRiRkFf/DWhF1AhKgveqkveqkveuWlLzTHLyISMxrxi4jEjIJfRCRmFPwBMLNTzOxZM0uZ2U4zuz7smsJmZhPNbJuZbQy7lrCZ2QlmtsHM/jV3jvx12DWFxcy+nPsZ2WFm9WY2KeyagmJm95nZATPb0aftg2b2jJm9ltueOBrHUvAHoxP4irtXAAuAa81sVsg1he16IBV2ERHxfeApdz8NOIOY9ouZnQxcB1S5+xyy65N+JtyqAnU/8PEBbTcDv3L3U4Ff5e6PmII/AO6+391fzt3+M9kf7JPDrSo8ZjYduBD4cdi1hM3MSoC/IbvYLu7e7u7/L9yqQlUAJMysACgG/hRyPYFx918Dbw1ovhhYl7u9DrhkNI6l4A+YmSWBM4HfhVtJqO4GvgZ0hV1IBHwYyABrc1NfPzaz48IuKgzu/m/AXcBeYD/Q4u6bwq0qdB9y9/2QHUACJ43GThX8ATKz44GHgC+5+3+EXU8YzGw5cMDdt4ZdS0QUAPOAH7r7mcC7jNKf82NNbv76YmAG8FfAcWZ2RbhVjU8K/oCYWSHZ0P+Zuz8cdj0hWghcZGZp4AHgPDP7abglhWofsM/du/8C3ED2F0EcLQHecPeMu3cADwNnh1xT2P7dzKYB5LYHRmOnCv4AmJmRncNNuft3w64nTO7+dXef7u5Jsi/c/Yu7x3ZU5+7/F3jTzD6aazofaAqxpDDtBRaYWXHuZ+Z8YvpCdx+PA1fmbl8JPDYaOy0YjZ3IoBYC/x141cy259q+4e5PhFiTRMdq4GdmVgT8EVgZcj2hcPffmdkG4GWy74TbRowu32Bm9cBiYKqZ7QNuBe4Afm5mNWR/MX5qVI6lSzaIiMSLpnpERGJGwS8iEjMKfhGRmFHwi4jEjIJfRCRmFPwigJn9JzN7wMxeN7MmM3vCzGb2vVKiyHih9/FL7OU+LPQIsM7dP5NrqwQ+FGphInmiEb8InAt0uPv/7m5w9+3Am933zewqM7u3z/2NZrY4d/sdM/t7M9tqZpvNrNrMnjOzP5rZRX2+/zEze8rMdpnZrYH970QGUPCLwBxgJBeNOw54zt3nA38G/hZYCnwS+Hafx1UDlwOVwKfMrGoExxQZNk31iIxcO/BU7varwF/cvcPMXgWSfR73jLs3A5jZw8AioDHIQkVAI34RgJ3A/EEe00n/n5e+SwJ2eO+1T7qAvwC4exf9B1cDr4+i66VIKBT8IvAvwAfM7JruBjM7Cyjv85g0UGlmE8zsFLLTNkO1NLeGaoLsSkovjqBmkWFT8Evs5UbrnyQbzK+b2U7gNvov+/ci8AbZqZy7yF5BcqheAH4CbAcecndN80godHVOkQCY2ZYRTzgAAAA2SURBVFVkFxGvDbsWEY34RURiRiN+EZGY0YhfRCRmFPwiIjGj4BcRiRkFv4hIzCj4RURi5v8DIiJIkAYBN7sAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"ax = cell_df[cell_df['Class'] == 4][0:50].plot(kind='scatter', x='Clump', y='UnifSize', color='DarkBlue', label='malignant');\n",
"cell_df[cell_df['Class'] == 2][0:50].plot(kind='scatter', x='Clump', y='UnifSize', color='Yellow', label='benign', ax=ax);\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data pre-processing and selection"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets first look at columns data types:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ID int64\n",
"Clump int64\n",
"UnifSize int64\n",
"UnifShape int64\n",
"MargAdh int64\n",
"SingEpiSize int64\n",
"BareNuc object\n",
"BlandChrom int64\n",
"NormNucl int64\n",
"Mit int64\n",
"Class int64\n",
"dtype: object"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cell_df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It looks like the __BareNuc__ column includes some values that are not numerical. We can drop those rows:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ID int64\n",
"Clump int64\n",
"UnifSize int64\n",
"UnifShape int64\n",
"MargAdh int64\n",
"SingEpiSize int64\n",
"BareNuc int64\n",
"BlandChrom int64\n",
"NormNucl int64\n",
"Mit int64\n",
"Class int64\n",
"dtype: object"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cell_df = cell_df[pd.to_numeric(cell_df['BareNuc'], errors='coerce').notnull()]\n",
"cell_df['BareNuc'] = cell_df['BareNuc'].astype('int')\n",
"cell_df.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 5, 1, 1, 1, 2, 1, 3, 1, 1],\n",
" [ 5, 4, 4, 5, 7, 10, 3, 2, 1],\n",
" [ 3, 1, 1, 1, 2, 2, 3, 1, 1],\n",
" [ 6, 8, 8, 1, 3, 4, 3, 7, 1],\n",
" [ 4, 1, 1, 3, 2, 1, 3, 1, 1]])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"feature_df = cell_df[['Clump', 'UnifSize', 'UnifShape', 'MargAdh', 'SingEpiSize', 'BareNuc', 'BlandChrom', 'NormNucl', 'Mit']]\n",
"X = np.asarray(feature_df)\n",
"X[0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We want the model to predict the value of Class (that is, benign (=2) or malignant (=4)). As this field can have one of only two possible values, we need to change its measurement level to reflect this."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 2, 2, 2, 2])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cell_df['Class'] = cell_df['Class'].astype('int')\n",
"y = np.asarray(cell_df['Class'])\n",
"y [0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Train/Test dataset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, we split our dataset into train and test set:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train set: (546, 9) (546,)\n",
"Test set: (137, 9) (137,)\n"
]
}
],
"source": [
"X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)\n",
"print ('Train set:', X_train.shape, y_train.shape)\n",
"print ('Test set:', X_test.shape, y_test.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"modeling\">Modeling (SVM with Scikit-learn)</h2>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The SVM algorithm offers a choice of kernel functions for performing its processing. Basically, mapping data into a higher dimensional space is called kernelling. The mathematical function used for the transformation is known as the kernel function, and can be of different types, such as:\n",
"\n",
" 1.Linear\n",
" 2.Polynomial\n",
" 3.Radial basis function (RBF)\n",
" 4.Sigmoid\n",
"Each of these functions has its characteristics, its pros and cons, and its equation, but as there's no easy way of knowing which function performs best with any given dataset, we usually choose different functions in turn and compare the results. Let's just use the default, RBF (Radial Basis Function) for this lab."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/jupyterlab/conda/envs/python/lib/python3.6/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning.\n",
" \"avoid this warning.\", FutureWarning)\n"
]
},
{
"data": {
"text/plain": [
"SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,\n",
" decision_function_shape='ovr', degree=3, gamma='auto_deprecated',\n",
" kernel='rbf', max_iter=-1, probability=False, random_state=None,\n",
" shrinking=True, tol=0.001, verbose=False)"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn import svm\n",
"clf = svm.SVC(kernel='rbf')\n",
"clf.fit(X_train, y_train) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After being fitted, the model can then be used to predict new values:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([2, 4, 2, 4, 2])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"yhat = clf.predict(X_test)\n",
"yhat [0:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"evaluation\">Evaluation</h2>"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.metrics import classification_report, confusion_matrix\n",
"import itertools"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"def plot_confusion_matrix(cm, classes,\n",
" normalize=False,\n",
" title='Confusion matrix',\n",
" cmap=plt.cm.Blues):\n",
" \"\"\"\n",
" This function prints and plots the confusion matrix.\n",
" Normalization can be applied by setting `normalize=True`.\n",
" \"\"\"\n",
" if normalize:\n",
" cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]\n",
" print(\"Normalized confusion matrix\")\n",
" else:\n",
" print('Confusion matrix, without normalization')\n",
"\n",
" print(cm)\n",
"\n",
" plt.imshow(cm, interpolation='nearest', cmap=cmap)\n",
" plt.title(title)\n",
" plt.colorbar()\n",
" tick_marks = np.arange(len(classes))\n",
" plt.xticks(tick_marks, classes, rotation=45)\n",
" plt.yticks(tick_marks, classes)\n",
"\n",
" fmt = '.2f' if normalize else 'd'\n",
" thresh = cm.max() / 2.\n",
" for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):\n",
" plt.text(j, i, format(cm[i, j], fmt),\n",
" horizontalalignment=\"center\",\n",
" color=\"white\" if cm[i, j] > thresh else \"black\")\n",
"\n",
" plt.tight_layout()\n",
" plt.ylabel('True label')\n",
" plt.xlabel('Predicted label')"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 2 1.00 0.94 0.97 90\n",
" 4 0.90 1.00 0.95 47\n",
"\n",
" micro avg 0.96 0.96 0.96 137\n",
" macro avg 0.95 0.97 0.96 137\n",
"weighted avg 0.97 0.96 0.96 137\n",
"\n",
"Confusion matrix, without normalization\n",
"[[85 5]\n",
" [ 0 47]]\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"# Compute confusion matrix\n",
"cnf_matrix = confusion_matrix(y_test, yhat, labels=[2,4])\n",
"np.set_printoptions(precision=2)\n",
"\n",
"print (classification_report(y_test, yhat))\n",
"\n",
"# Plot non-normalized confusion matrix\n",
"plt.figure()\n",
"plot_confusion_matrix(cnf_matrix, classes=['Benign(2)','Malignant(4)'],normalize= False, title='Confusion matrix')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also easily use the __f1_score__ from sklearn library:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9639038982104676"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import f1_score\n",
"f1_score(y_test, yhat, average='weighted') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets try jaccard index for accuracy:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.9635036496350365"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from sklearn.metrics import jaccard_similarity_score\n",
"jaccard_similarity_score(y_test, yhat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<h2 id=\"practice\">Practice</h2>\n",
"Can you rebuild the model, but this time with a __linear__ kernel? You can use __kernel='linear'__ option, when you define the svm. How the accuracy changes with the new kernel function?"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# write your code here\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Double-click __here__ for the solution.\n",
"\n",
"<!-- Your answer is below:\n",
" \n",
"clf2 = svm.SVC(kernel='linear')\n",
"clf2.fit(X_train, y_train) \n",
"yhat2 = clf2.predict(X_test)\n",
"print(\"Avg F1-score: %.4f\" % f1_score(y_test, yhat2, average='weighted'))\n",
"print(\"Jaccard score: %.4f\" % jaccard_similarity_score(y_test, yhat2))\n",
"\n",
"-->"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"<h2>Want to learn more?</h2>\n",
"\n",
"IBM SPSS Modeler is a comprehensive analytics platform that has many machine learning algorithms. It has been designed to bring predictive intelligence to decisions made by individuals, by groups, by systems – by your enterprise as a whole. A free trial is available through this course, available here: <a href=\"http://cocl.us/ML0101EN-SPSSModeler\">SPSS Modeler</a>\n",
"\n",
"Also, you can use Watson Studio to run these notebooks faster with bigger datasets. Watson Studio is IBM's leading cloud solution for data scientists, built by data scientists. With Jupyter notebooks, RStudio, Apache Spark and popular libraries pre-packaged in the cloud, Watson Studio enables data scientists to collaborate on their projects without having to install anything. Join the fast-growing community of Watson Studio users today with a free account at <a href=\"https://cocl.us/ML0101EN_DSX\">Watson Studio</a>\n",
"\n",
"<h3>Thanks for completing this lesson!</h3>\n",
"\n",
"<h4>Author: <a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a></h4>\n",
"<p><a href=\"https://ca.linkedin.com/in/saeedaghabozorgi\">Saeed Aghabozorgi</a>, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.</p>\n",
"\n",
"<hr>\n",
"\n",
"<p>Copyright &copy; 2018 <a href=\"https://cocl.us/DX0108EN_CC\">Cognitive Class</a>. This notebook and its source code are released under the terms of the <a href=\"https://bigdatauniversity.com/mit-license/\">MIT License</a>.</p>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python",
"language": "python",
"name": "conda-env-python-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment