Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save cdeweyx/d185c0076c41957c3baf00e51f3c9bff to your computer and use it in GitHub Desktop.
Save cdeweyx/d185c0076c41957c3baf00e51f3c9bff to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Kaggle March Madness Challenge 2018"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"Google Cloud and NCAA® have teamed up to bring you this year’s version of the Kaggle machine learning competition. Another year, another chance to anticipate the upsets, call the probabilities, and put your bracketology skills to the leaderboard test. Kagglers will join the millions of fans who attempt to forecast the outcomes of March Madness® during this year's NCAA Division I Men’s and Women’s Basketball Championships. But unlike most fans, you will pick your bracket using a combination of NCAA’s historical data and your computing power, while the ground truth unfolds on national television."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Challenge Home: https://www.kaggle.com/c/mens-machine-learning-competition-2018\n",
"\n",
"\n",
"\n",
"- Basic Logistic Regression Starter Kernel: https://www.kaggle.com/osciiart/basic-starter-kernel-ncaa-men-s-dataset-with-jp\n",
"\n",
"\n",
"- Least Squares Starter Kernel: https://www.kaggle.com/baeng72/basic-least-squares-ratings\n",
"\n",
"\n",
"- NCAA Tournaments Competition Walkthrough: https://www.kaggle.com/asindico/ncaa-tournaments-competition-walkthrough\n",
"\n",
"\n",
"- Basic Starter Kernel: https://www.kaggle.com/juliaelliott/basic-starter-kernel-ncaa-men-s-dataset\n",
"\n",
"\n",
"- Feature Engineering with Advanced Statistics: https://www.kaggle.com/lnatml/feature-engineering-with-advanced-stats\n",
"\n",
"\n",
"- FiveThirtyEight Elo Ratings: https://www.kaggle.com/lpkirwin/fivethirtyeight-elo-ratings\n",
"\n",
"\n",
"- Extensive NCAA Exploratory Analysis: https://www.kaggle.com/captcalculator/a-very-extensive-ncaa-exploratory-analysis\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Preparation\n",
"Import packages and load in initial datasets"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import pickle\n",
"\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.utils import shuffle\n",
"from sklearn.model_selection import GridSearchCV\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import log_loss\n",
"from sklearn import preprocessing\n",
"from sklearn import model_selection \n",
"from sklearn.metrics import confusion_matrix\n",
"from sklearn.metrics import classification_report\n",
"\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"data_dir = './March Madness 2018/DataFiles/'\n",
"df_seeds = pd.read_csv(data_dir + 'NCAATourneySeeds.csv')\n",
"df_tour = pd.read_csv(data_dir + 'NCAATourneyCompactResults.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>Seed</th>\n",
" <th>TeamID</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>W01</td>\n",
" <td>1207</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>W02</td>\n",
" <td>1210</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>W03</td>\n",
" <td>1228</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>W04</td>\n",
" <td>1260</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>W05</td>\n",
" <td>1374</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season Seed TeamID\n",
"0 1985 W01 1207\n",
"1 1985 W02 1210\n",
"2 1985 W03 1228\n",
"3 1985 W04 1260\n",
"4 1985 W05 1374"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_seeds.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>WScore</th>\n",
" <th>LTeamID</th>\n",
" <th>LScore</th>\n",
" <th>WLoc</th>\n",
" <th>NumOT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1116</td>\n",
" <td>63</td>\n",
" <td>1234</td>\n",
" <td>54</td>\n",
" <td>N</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1120</td>\n",
" <td>59</td>\n",
" <td>1345</td>\n",
" <td>58</td>\n",
" <td>N</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1207</td>\n",
" <td>68</td>\n",
" <td>1250</td>\n",
" <td>43</td>\n",
" <td>N</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1229</td>\n",
" <td>58</td>\n",
" <td>1425</td>\n",
" <td>55</td>\n",
" <td>N</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1242</td>\n",
" <td>49</td>\n",
" <td>1325</td>\n",
" <td>38</td>\n",
" <td>N</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID WScore LTeamID LScore WLoc NumOT\n",
"0 1985 136 1116 63 1234 54 N 0\n",
"1 1985 136 1120 59 1345 58 N 0\n",
"2 1985 136 1207 68 1250 43 N 0\n",
"3 1985 136 1229 58 1425 55 N 0\n",
"4 1985 136 1242 49 1325 38 N 0"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_tour.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Seed Based Logistic Regression\n",
"Using just seeding the predict winner and confidence, use this as baseline model"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>Seed</th>\n",
" <th>TeamID</th>\n",
" <th>Seed_int</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>W01</td>\n",
" <td>1207</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>W02</td>\n",
" <td>1210</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>W03</td>\n",
" <td>1228</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>W04</td>\n",
" <td>1260</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>W05</td>\n",
" <td>1374</td>\n",
" <td>5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season Seed TeamID Seed_int\n",
"0 1985 W01 1207 1\n",
"1 1985 W02 1210 2\n",
"2 1985 W03 1228 3\n",
"3 1985 W04 1260 4\n",
"4 1985 W05 1374 5"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Convert seed to int\n",
"df_seeds['Seed_int'] = df_seeds['Seed'].str[1:3]\n",
"df_seeds['Seed_int'] = df_seeds['Seed_int'].apply(pd.to_numeric)\n",
"df_seeds.head()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Drop unnecessary columns\n",
"df_seeds.drop(labels=['Seed'], inplace=True, axis=1)\n",
"df_tour.drop(labels=['DayNum', 'WScore', 'LScore', 'WLoc', 'NumOT'], inplace=True, axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>TeamID</th>\n",
" <th>Seed_int</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>1207</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>1210</td>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>1228</td>\n",
" <td>3</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>1260</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>1374</td>\n",
" <td>5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season TeamID Seed_int\n",
"0 1985 1207 1\n",
"1 1985 1210 2\n",
"2 1985 1228 3\n",
"3 1985 1260 4\n",
"4 1985 1374 5"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_seeds.head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>1116</td>\n",
" <td>1234</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>1120</td>\n",
" <td>1345</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>1207</td>\n",
" <td>1250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>1229</td>\n",
" <td>1425</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>1242</td>\n",
" <td>1325</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season WTeamID LTeamID\n",
"0 1985 1116 1234\n",
"1 1985 1120 1345\n",
"2 1985 1207 1250\n",
"3 1985 1229 1425\n",
"4 1985 1242 1325"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_tour.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>WSeed</th>\n",
" <th>LSeed</th>\n",
" <th>SeedDiff</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>1116</td>\n",
" <td>1234</td>\n",
" <td>9</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>1120</td>\n",
" <td>1345</td>\n",
" <td>11</td>\n",
" <td>6</td>\n",
" <td>5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>1207</td>\n",
" <td>1250</td>\n",
" <td>1</td>\n",
" <td>16</td>\n",
" <td>-15</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>1229</td>\n",
" <td>1425</td>\n",
" <td>9</td>\n",
" <td>8</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>1242</td>\n",
" <td>1325</td>\n",
" <td>3</td>\n",
" <td>14</td>\n",
" <td>-11</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season WTeamID LTeamID WSeed LSeed SeedDiff\n",
"0 1985 1116 1234 9 8 1\n",
"1 1985 1120 1345 11 6 5\n",
"2 1985 1207 1250 1 16 -15\n",
"3 1985 1229 1425 9 8 1\n",
"4 1985 1242 1325 3 14 -11"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Merge dataframes\n",
"df_winseeds = df_seeds.rename(columns={'TeamID':'WTeamID', 'Seed_int':'WSeed'})\n",
"df_lossseeds = df_seeds.rename(columns={'TeamID':'LTeamID', 'Seed_int':'LSeed'})\n",
"df_dummy = pd.merge(left=df_tour, right=df_winseeds, how='left', on=['Season', 'WTeamID'])\n",
"df_concat = pd.merge(left=df_dummy, right=df_lossseeds, on=['Season', 'LTeamID'])\n",
"df_concat['SeedDiff'] = df_concat.WSeed - df_concat.LSeed\n",
"df_concat.head()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>SeedDiff</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>-15</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-11</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" SeedDiff Result\n",
"0 1 1\n",
"1 5 1\n",
"2 -15 1\n",
"3 1 1\n",
"4 -11 1"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Create training data set\n",
"df_wins = pd.DataFrame()\n",
"df_wins['SeedDiff'] = df_concat['SeedDiff']\n",
"df_wins['Result'] = 1\n",
"\n",
"df_losses = pd.DataFrame()\n",
"df_losses['SeedDiff'] = -df_concat['SeedDiff']\n",
"df_losses['Result'] = 0\n",
"\n",
"df_predictions = pd.concat((df_wins, df_losses))\n",
"df_predictions.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X_train = df_predictions['SeedDiff'].values.reshape(-1,1)\n",
"Y_train = df_predictions['Result'].values\n",
"X_train, Y_train = shuffle(X_train, Y_train)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Best log_loss: -0.5532, with best C: 0.021544346900318846\n"
]
}
],
"source": [
"# Create and test model\n",
"logreg = LogisticRegression()\n",
"params = {'C': np.logspace(start=-5, stop=5, num=10)}\n",
"clf = GridSearchCV(logreg, params, scoring='neg_log_loss', refit=True)\n",
"clf.fit(X_train, Y_train)\n",
"print('Best log_loss: {:.4}, with best C: {}'.format(clf.best_score_, clf.best_params_['C']))"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Type</th>\n",
" <th>Log Loss</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Seed Based Logistic Regression</td>\n",
" <td>-0.55315</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Type Log Loss\n",
"0 Seed Based Logistic Regression -0.55315"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Store model results\n",
"df_results = pd.DataFrame({'Type': ['Seed Based Logistic Regression'], 'Log Loss': [clf.best_score_]}, columns=['Type', 'Log Loss'])\n",
"df_results.head()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEKCAYAAAD9xUlFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJzt3Xd8VFX6x/HPkwQINfSOBBBEUGqQ\n3qyACqiogBULIIKC6+7q7v7UVbfougoIFuwVFBcUUVGRIl1C7xCKEGlBqdIEzu+Pe4ljTBlIZibl\n+3695pVbztz7zJ3JPHPPPfccc84hIiICEBXpAEREJPdQUhARkVRKCiIikkpJQUREUikpiIhIKiUF\nERFJpaQgIiKplBRERCSVkoKIiKSKiXQAZ6p8+fIuPj4+0mGIiOQpixYt2uOcq5BVuTyXFOLj40lM\nTIx0GCIieYqZfR9MOVUfiYhIKiUFERFJpaQgIiKplBRERCSVkoKIiKRSUhARkVRKCiIikqrAJIWV\nP+xn5Dcb+PHQsUiHIiKSaxWYpDA7aQ/Pfr2e1v+exp8/Ws7anQciHZKISK6T5+5oPlsDO9bh0vMr\n8vqcLUxYnMwHidtod2557mgXT6d6FYmKskiHKCISceaci3QMZyQhIcFlt5uLvT8f5/3vtvL2vC3s\nOnCM2uWL069tPNc1r06xwgUmT4pIAWJmi5xzCVmWK4hJ4bRfTp7i8xU7eG32ZpYn76dUbAx9Wp7D\nba3jqVq6aI7sQ0QkN1BSOAPOORZ9v5fX52xmysqdmBldL6jMne1q0fScMjm6LxGRSAg2KaiuBDAz\nEuLLkhBflm0/HeatuVv4YOE2Ji/fQdNzSnNnu1p0aViZmOgCc11eRAoonSlk4NCxE3yUuI035m7h\n+x8PUzUultvbxtO3ZU1KFFEuFZG8RdVHOeTkKce0tbt5bfYm5m/6idLFCnFH21rc1iaeuKKFwhaH\niEh2KCmEwJKtexk1LYlv1u6mZGwM/drEc0e7WpQuVjgi8YiIBCvYpBDSSnIz62Jm68wsycweSmf9\nOWY23cyWmNlyM+sWyniyq+k5ZXjt9hZMHtKONnXKMXJaEm3/PY2npqzVndIiki+E7EzBzKKB9cBl\nQDKwEOjjnFsdUGYMsMQ596KZNQA+d87FZ7bdSJ4ppLV25wFGTUvisxU7iI2J5qaW59C/Q20qloqN\ndGgiIr+RG84ULgKSnHObnHPHgXFAjzRlHFDKn44DtocwnhxXv3IpRvVtxtfDOtL1gsq8MXcL7Z6e\nzqOfrGTH/iORDk9E5IyFMilUA7YFzCf7ywI9BtxsZsnA58CQEMYTMudWLMGzNzZh2h86ck2Tary3\nYCsdn57BXyauYNtPhyMdnohI0EKZFNLrTChtXVUf4E3nXHWgG/COmf0uJjPrb2aJZpaYkpISglBz\nRs1yxXmqVyNm/LETN7SozkeJyXR+ZgZ/HL+MLXt+jnR4IiJZCuU1hdbAY865K/z5hwGcc/8KKLMK\n6OKc2+bPbwJaOed2Z7Td3HRNISs79x/lpZkbGfvdVn45eYoeTaox7NJ6nFOuWKRDE5ECJjdcU1gI\n1DWzWmZWGOgNTEpTZitwCYCZnQ/EArn3VOAMVY6L5bHuDZn1587c1b42X6zcwSXPzuCxSavUWklE\ncqWQ3qfgNzEdDkQDrzvn/mFmjwOJzrlJfoujV4ASeFVLf3LOfZXZNvPSmUJauw4cZfjUDXyYuI3Y\nmCgGdKzDne1qUVx3SItIiOnmtVwsafchnvlyHVNW7aR8iSLcf2ldereoQSH1rSQiIZIbqo8kA+dW\nLMFLtzRnwqA21C5fnP/7eCWXPTuTz5bvIK8laRHJX5QUIqjZOWX4YEArXr89gSIx0dz7/mJ6jp7D\n3I17Ih2aiBRQSgoRZmZcXL8Sn9/fnmeub0zKwWP0fWUBt73+Hau3axxpEQkvXVPIZY7+cpK3521h\n9PSNHDj6Cz2bVOOBy+pRo6yasYrI2dOF5jxu/+FfeHHmRt6Ysxnn4OZWNRl88bmULa4eWUXkzCkp\n5BM79h9h+NcbGL9oG8ULx3D/pXW5rU28WiqJyBlR66N8okpcUZ7q1YgpQzvQrGYZnvxsDd1GzGJO\nki5Gi0jOU1LII+pVKsmb/Vrwyq0JHD1xkpteXcDAdxaRvFcd7olIzlFSyEPMjMsaVOLrYR35w2X1\nmLF+N5f8dybDp67n6C8nIx2eiOQDSgp5UGyhaIZcUpdv/tCJSxtUYvjUDVz67EymrNypm99EJFuU\nFPKwaqWLMrpvM96/uyXFC8cw8N1F3PLadyTtPhjp0EQkj1JSyAfa1CnPZ/e149GrG7AseR9dhs/i\nycmrOXj0l0iHJiJ5jJJCPhETHUW/trWY/mAnejWvzmtzNtP5mZl8tCiZU6dUpSQiwVFSyGfKlyjC\nv69rxMeD2lK9TFEeHL+M616ay/LkfZEOTUTyACWFfKpxjdJMuKcN/+nViG0/HabH6Dk89L/l7Dt8\nPNKhiUgupqSQj0VFGdcn1GDag524s20txi9K5tJnZzJp2Xa1UhKRdCkpFAClYgvxt6saMGlwW6qW\nLsp9Y5dwx5sL+WHfkUiHJiK5jJJCAdKwahwTB7Xlb1eez/xNP3HZszN5ffZmTupCtIj4lBQKmOgo\n4672tflqWAdaxJfl8cmrufbFuazZobEbRERJocCqUbYYb/ZrwYjeTUj+6TBXPz+bp6asVXcZIgWc\nkkIBZmb0aFKNqQ90pGfTarw4YyNdhn/LXPXAKlJghTQpmFkXM1tnZklm9lA6658zs6X+Y72ZqTF9\nBJQpXphnrm/Me3e1xAF9X13AH8cvU/NVkQIoZIPsmFk0sB64DEgGFgJ9nHOrMyg/BGjqnLsjs+0W\ntEF2wu3I8ZOM+GYDr8zaRJlihXjk6oZc3agKZhbp0EQkG3LDIDsXAUnOuU3OuePAOKBHJuX7AGND\nGI8EoWjhaB7qWv83zVfvfCtRzVdFCohQJoVqwLaA+WR/2e+YWU2gFjAthPHIGQhsvjpv449qvipS\nQIQyKaRX35DRN0pv4CPnXLpNX8ysv5klmlliSkpKjgUomUuv+WqfV+az9UeN9iaSX4UyKSQDNQLm\nqwPbMyjbm0yqjpxzY5xzCc65hAoVKuRgiBKM081Xn7m+MWu2H6DLiG95b8H36ipDJB8KZVJYCNQ1\ns1pmVhjvi39S2kJmdh5QBpgXwlgkm8yMXs2r8+WwDjQ7pwx/nbiSW1//jh37da1BJD8JWVJwzp0A\nBgNfAmuAD51zq8zscTPrHlC0DzDO6WdnnlC1dFHevuMinujRkMQte7n8uW+ZsDhZZw0i+UTImqSG\nipqk5h5b9vzMg+OXkfj9Xq5oWIl/XHMh5UsUiXRYIpKO3NAkVfK5+PLF+WBAa/7SrT7T16Zw+XPf\nMmXljkiHJSLZoKQg2RIdZfTvUIfJ97WjaulYBr67mKHjlrD/sMaHFsmLlBQkR9SrVJKJg9oy9NK6\nTF6+g8uHz2TGut2RDktEzpCSguSYQtFRDL20HhMHtSWuaCFuf2MhD09YwaFjJyIdmogESUlBctyF\n1eOYNLgdAzrWZtzCrXQZ/i3zN/0Y6bBEJAhKChISsYWiebjr+Ywf0JroKKP3mPk8/ulqjdcgkssp\nKUhIJcSX5Yv723NLq5q8PmczPUfPYd3Og5EOS0QyoKQgIVescAxP9LyAN/q1YM+hY3QfNZt35m3R\nDW8iuZCSgoRN5/Mq8sX9HWhdpxz/98kq7n47kR8PHYt0WCISQElBwqpCySK8cXsLHr26Ad+u30OX\nEbOYtUE934rkFkoKEnZmRr+2tfj43raULlqIW177jn9+vobjJ05FOjSRAk9JQSKmQdVSTBrcjptb\nncOYbzdx7Ytz2JhyKNJhiRRoSgoSUUULR/NkzwsZc0tzfth7hKtGzmbcd1t1EVokQpQUJFe4vGFl\npgztQLOapXlowgoGvbeYfYePRzoskQJHSUFyjUqlYnnnjpY83LU+X6/eRdcRs3QntEiYKSlIrhIV\nZQzoWIeJg9oSWyiaPq/M55kv1/HLSV2EFgkHJQXJlS6sHsfkIe24oXkNRk1P4vqX5vH9jz9HOiyR\nfE9JQXKt4kVieKpXI0b3bcamlENcOXI2k5Ztj3RYIvmakoLkelc2qsIXQztQv3JJ7hu7hL9MXKGO\n9URCRElB8oRqpYsytn8r7ulUh/cXbKXnaN3TIBIKSgqSZxSKjuLPXerzRr8W7DpwlO7Pz+aTpT9E\nOiyRfCWkScHMupjZOjNLMrOHMihzg5mtNrNVZvZ+KOOR/KHzeRX5/P72NKhaivvHLeXhCctVnSSS\nQ0KWFMwsGhgNdAUaAH3MrEGaMnWBh4G2zrmGwNBQxSP5S5W4ooy9uxWDOtVh7Hfb6Dl6Dkm7VZ0k\nkl2hPFO4CEhyzm1yzh0HxgE90pS5GxjtnNsL4JzTSO8StJjoKP7UpT5v9mvB7oPeOA0fL1F1kkh2\nhDIpVAO2Bcwn+8sC1QPqmdkcM5tvZl1CGI/kU53Oq8jn97XngqpxDP1gKX/+aDlHjqs6SeRshDIp\nWDrL0vZyFgPUBToBfYBXzaz07zZk1t/MEs0sMSVFfe/L71WOi+X9u1syuPO5fLjodHWShv0UOVOh\nTArJQI2A+epA2juPkoFPnHO/OOc2A+vwksRvOOfGOOcSnHMJFSpUCFnAkrfFREfx4BXn8Va/i9hz\n6BhXPz+H/y1KjnRYInlKUEnBPBea2RVm1sHMygXxtIVAXTOrZWaFgd7ApDRlPgY6+/soj1edtCn4\n8EV+r0O9Cnx+f3saVY/jD+OX8cfxy1SdJBKkmMxWmlk88CegC7AZSAFi8b7s9wEvAe+6dDq/d86d\nMLPBwJdANPC6c26VmT0OJDrnJvnrLjez1cBJ4I/OOXWLKdlWqVQs793VkhHfbGDU9CSWJe9jdN9m\n1K1UMtKhieRqltlgJmb2IfAiMNM5dyrNuirATcAe59yboQwyUEJCgktMTAzX7iQfmLUhhaHjlnL4\n+En+ee0FXNO0eqRDEgk7M1vknEvIslxeG+FKSUHOxu4DRxk8dgnfbf6Jm1qewyNXN6BITHSkwxIJ\nm2CTwllfaDazzmf7XJFwq1gqlvfvasmADrV5b8FWrn9pHtt+OhzpsERyney0Pnorx6IQCYOY6Cge\n7nY+L9/SnM17fuaq52czfa3ulxQJlNWF5gkZrQKCaYEkkutc0bAy9SuXZOC7i+n35kIGdz6XYZfV\nIzoqvVtrRAqWTJMCXnPR24C0Q14Z0CYkEYmEQc1yxZk4qA2PfLKSUdOTWLJtLyN6N6V8iSKRDk0k\norJKCguAg8656WlXmNnG0IQkEh6xhaJ5uldjEmqW5f8+WclVI2cz+qamNK9ZNtKhiURMVtcUuqaX\nEACcczpTkHzhhhY1mDCoDYVjorjx5fm8Nnszea1VnkhOyTQppHdTmkh+1LBqHJ8OaUfn+hV5YvJq\nBr+/hINHf4l0WCJhp5HXRHxxRQsx5pbmPNS1PlNW7aTHqDms26lO9aRgUVIQCWBmDOxYh/fuasnB\nYyfoOXoOE5eoUz0pOJQURNLRqnY5PhvSjgurxzHsg2X8deIKjp1Qp3qS/wXbS2oXM1toZrvN7Ccz\n22tmP4U6OJFI0l3QUhAFe6YwChiAN3JaBaC8/1ckX/vNXdApP3P1qNnMXK+BniT/CjYpJANL/cFw\nTp5+hDIwkdzkioaVmTSkHZVLxXL7G9/x/DcbOHVKjfMk/8nq5rXT/gR8amYzgGOnFzrnRoYiKJHc\nqFb54kwY1Ia/TFjBf79ez9Jt+3j2xibEFS0U6dBEckywZwp/xxsEpzRetdHph0iBUqxwDM/d2ITH\nezRk5voUuo+azertByIdlkiOCfZMoaJzrnlIIxHJI8yMW1vH07BqKQa9t5hrX5zDP6+5kGubafAe\nyfuCPVP4xswuDmkkInlM85plmTykPY2rl+aBD5fxfx+v5PiJU1k/USQXCzYp3A1MNbNDapIq8qsK\nJYvw3l0t6d+hNu/M/54bXp7Hjv1HIh2WyFkLNimUBwoBcahJqshvxERH8Zdu5/PCTc3YsOsgV42c\nzdykPZEOS+SsBJUU/OanJYDGQMuAh4j4ul1YhU8Gt6NM8cLc/NoCXpq5Ub2tSp4T7B3NdwJzgWnA\nU/7ffwbxvC5mts7MkszsoXTW325mKWa21H/cdYbxi+Qq51Yswcf3tqXrBVX49xdrGfjuIvW2KnlK\nsNVHQ4EEYItzrj3QHNiR2RPMLBoYDXQFGgB9zKxBOkU/cM418R+vBh+6SO5UokgMo/o25W9Xns/U\nNbvpMWoO63ept1XJG4JNCkedc0cAzKywc24VUD+L51wEJDnnNjnnjgPjgB5nH6pI3mFm3NW+Nu/f\n1ZIDR0/QY9QcJi3bHumwRLIUbFLYYWalgU+BL83sf8CuLJ5TDdgWMJ/sL0vrOjNbbmYfmVmNIOMR\nyRNa1i7HZ/e1o2HVUtw3dgl//3QVv5xUs1XJvYK90NzdObfPOfd/wJPAe2T9q9/S21Sa+U+BeOdc\nI2Aq8Fa6GzLrb2aJZpaYkqLOyCRvqVQqlrH9W3F7m3jemLOFvq/MZ/eBo5EOSyRdQY+nYGatzOxW\n59w3wEygUhZPSQYCf/lXB35z/uyc+9E5d7ovpVfwrlX8jnNujHMuwTmXUKGCWsJK3lMoOorHujdk\nRO8mrPzhAFc+P5vvNutWH8l9gm199DfgUeBv/qJY4P0snrYQqGtmtcysMNAbmJRmu1UCZrsDa4KJ\nRySv6tGkGh/f25YSRWLo88p8Xp21Sc1WJVcJ9kyhF9AN+BnAOfcDUCqzJzjnTgCDgS/xvuw/dM6t\nMrPHzay7X+w+M1tlZsuA+4Dbz/wliOQt51UuySeD23JJ/Yo8+dkaBo9dws/HTkQ6LBEALJhfKWa2\nwDnX0swWO+eamVkxYL5/LSCsEhISXGJiYrh3K5LjnHO8NHMT//lyLbUrlOClm5tzbsUSkQ5L8ikz\nW+ScS8iqXLBnChPMbDQQZ2b9gK+A17MToEhBZ2bc06kO79zZkp9+Pk6PUbP5YkWmt/+IhFywrY+e\nAibjXRNoDPzDOTc8lIGJFBRtzy3P5CHtOLdSSe55bzH/+nwNJ9RsVSIk0+ojM/vKOXd5GOPJkqqP\nJL86duIkT0xezbvzt9Kqdlme79OMCiWLRDosySdyqvpI7T9FwqRITDRP9ryQZ65vzJKt+7jq+Vks\n+n5vpMOSAiarkdfizOzajFY65ybkcDwiBV6v5tU5v0pJ7nl3Mb3HzONvVzbg1tY1MUvvflCRnJVl\nUgCuIuO7k5UUREKgYdU4Ph3cjmEfLuXRSatYsnUv/7z2QooVDnYEXZGzk9Un7Hvn3B1hiUREfiOu\nWCFevTWB0dOTeHbqetbuPMiLNzenVvnikQ5N8rGsrinofFUkgqKijCGX1OXNfhex88BRuj8/mykr\n1WxVQierpHBrVhswVXSKhFzHehWYPKQdtSsUZ+C7i/nHZ6vV26qERFZJ4XkzG2Jm5wQuNLPCZnax\nmb0F3Ba68ETktOplivHhwNbc0qomr8zaTN9X5rNLva1KDssqKXQBTgJjzWy7ma02s03ABqAP8Jxz\n7s0QxygiviIx0TzR84Jfe1sdOYt5G3+MdFiSjwTV9xGAmRUCygNHnHP7QhpVJnTzmohn/a6DDHx3\nEVv2/MyDV5zHwA51iIpSba6kL0duXjOzWDMbamajgH5ASiQTgoj8ql6lkkwa3I6uF1bh6Snr6P9O\nIvsP/xLpsCSPy6r66C0gAViB13X2f0MekYgErUSRGEb1acqjVzdgxroUrho1i5U/7I90WJKHZZUU\nGjjnbnbOvYw3pkL7MMQkImfAzOjXthYfDGjNiZOOa1+cy7jvtmrwHjkrWSWF1HNRf9AcEcmlmtcs\nw+Qh7WhZqywPTVjBHz9azpHjJyMdluQxWSWFxmZ2wH8cBBqdnjazA+EIUESCV65EEd7sdxH3XVKX\n/y1O5poX5rB5z8+RDkvykEyTgnMu2jlXyn+UdM7FBExnOhyniERGdJTxwGX1eOP2FgF3Qe+MdFiS\nRwQ78pqI5DGdzqsYcBf0Iv75+RrdBS1ZUlIQyccC74Ie8+0meo+Zz/Z9RyIdluRiSgoi+dzpu6BH\n9mnK2h0H6DZyFt+s2RXpsCSXCmlSMLMuZrbOzJLM7KFMyvUyM2dmWd5tJyJnp3vjqky+rz1V4opy\n51uJ/EvVSZKOkCUFM4sGRgNdgQZAHzNrkE65ksB9wIJQxSIinlrlizNxUBtubnUOL3+7iRtfnscP\nqk6SAKE8U7gISHLObXLOHQfGAT3SKfcE8DSg7h5FwiC2kDcW9Ki+TVm/6xDdRsxi6mpVJ4knlEmh\nGrAtYD7ZX5bKzJoCNZxzkzPbkJn1N7NEM0tMSUnJ+UhFCqCrGlVl8pB2VC9TlLveTtQYDQKENilk\nNK6zt9IsCngO+ENWG3LOjXHOJTjnEipUqJCDIYoUbPHli/O/e9qkjtFww8vzSN57ONJhSQSFMikk\nAzUC5qsD2wPmSwIXADPMbAvQCpiki80i4RVbyGudNLpvMzbsOsSVI2fztaqTCqxQJoWFQF0zq2Vm\nhYHewKTTK51z+51z5Z1z8c65eGA+0N05p8ESRCLgykZVmDykHTXKFuXutxN5cvJqjp9QdVJBE7Kk\n4HegNxj4ElgDfOicW2Vmj5tZ91DtV0TO3unqpNta1+TV2Zu5/uV5bPtJ1UkFSdAjr+UWGnlNJDw+\nX7GDP3+0HDN45vrGXN6wcqRDkmzIkZHXRKTg6nZhFSbf146a5YrT/51F/P3TVRw7oa648zslBRHJ\nUM1yxfnontbc3iaeN+Zs4doX5rIx5VCkw5IQUlIQkUwViYnmse4NGXNLc7bvO8JVI2drZLd8TElB\nRIJyecPKTBnagWY1S/PQhBUMem8x+w4fj3RYksOUFEQkaJVKxfLOHS15uGt9vl69i64jZjFv44+R\nDktykJKCiJyRqChjQMc6TBjUhthC0fR9dT7/+XKtusjIJ5QUROSsNKpemslD2nF98+qMnr6RXi/N\n4/sfNR50XqekICJnrXiRGJ7u1ZjRfZuxOcXrcXXC4mRdhM7DlBREJNuubFSFL4Z2oGHVOB74cBn3\nj1vKgaO/RDosOQtKCiKSI6qVLsrY/q34w2X1+GzFDrqNmMWi73+KdFhyhpQURCTHREcZQy6py/iB\nrTGDG16ez4ipGzihi9B5hpKCiOS4ZueU4fP72tO9cVWem7qe3mPma5yGPEJJQURComRsIZ67sQnP\n3diYtTsP0nXELCYu0UXo3E5JQURC6pqm1fn8vvbUq1SSYR8s4553F7Pn0LFIhyUZUFIQkZA7p1wx\nPhzQmoe71mfa2t1c8dy3TFm5I9JhSTqUFEQkLKL9O6En39eOKqVjGfjuYoZ9sJT9h9V0NTdRUhCR\nsKpXqSQTB7Xl/kvqMmnZdi4fPpMZ63ZHOizxKSmISNgVio5i2GX1+HhQW0rFFuL2Nxbyl4krOHTs\nRKRDK/CUFEQkYi6sHsenQ9oxoENtxn63la4jvmXBJvW6GklKCiISUbGFonm42/mMH9CaKDN6vzKf\nJyav5ugvGvozEpQURCRXSIgvyxf3t+fmljV5bfZmuo2cxdJt+yIdVoET0qRgZl3MbJ2ZJZnZQ+ms\nH2hmK8xsqZnNNrMGoYxHRHK3YoVjeKLnBbxz50UcOX6S616cy3+/WsfxE+omI1xClhTMLBoYDXQF\nGgB90vnSf985d6FzrgnwNPBsqOIRkbyjfd0KTBnagWuaVuP5aUn0GD2HNTsORDqsAiGUZwoXAUnO\nuU3OuePAOKBHYAHnXOC7XBzQ/e8iAkBc0UI8c31jXrk1gZSDx+g+ajbDp67n2AldawilUCaFasC2\ngPlkf9lvmNm9ZrYR70zhvvQ2ZGb9zSzRzBJTUlJCEqyI5E6XNajEV8M60PWCKgyfuoErR84mcYu6\n5A6VUCYFS2fZ784EnHOjnXN1gD8Df0tvQ865Mc65BOdcQoUKFXI4TBHJ7coWL8zIPk154/YWHDl+\nkl4vzeNvH6/QQD4hEMqkkAzUCJivDmzPpPw4oGcI4xGRPK5z/Yp8NawDd7StxfsLtnLZszP5ctXO\nSIeVr4QyKSwE6ppZLTMrDPQGJgUWMLO6AbNXAhtCGI+I5APFi8TwyNUNmDioLWWKFWbAO4sY8E4i\nuw4cjXRo+ULIkoJz7gQwGPgSWAN86JxbZWaPm1l3v9hgM1tlZkuBB4DbQhWPiOQvjWuU5tMh7fhT\nl/OYsS6FS/87k3fnf8+pU2qvkh2W1wa8SEhIcImJiZEOQ0Rykc17fuYvE1Ywb9OPtIgvw7+ubcS5\nFUtEOqxcxcwWOecSsiqnO5pFJM+rVb4479/dkqd7NWL9rkN0GzGLEVM36Ka3s6CkICL5gplxQ0IN\npj7QkSsuqMxzU9dz5chZar56hpQURCRfqVCyCM/7zVcPq/nqGVNSEJF86XTz1X5t43nPb746efl2\n8tp11HBTUhCRfKt4kRgevbohEwe1pWzxIgx+fwl9X1nAup0HIx1arqWkICL5XpMapZk8pB1P9LyA\nNTsP0G3kLB6btIr9R1SllJaSgogUCNFRxi2tajL9D53oc1EN3p63hc7PzGDcd1t1b0MAJQURKVDK\nFC/Mkz0vZNLgdtQuX5yHJqyg5wtzWLJ1b6RDyxWUFESkQLqgWhzjB7Zm+I1N2Ln/KNe8MJcHxy8j\n5eCxSIcWUUoKIlJgmRk9m1Zj2oOdGNCxNp8s/YGLn5nBq7M28cvJgnnjm5KCiBR4JYrE8HDX85ky\ntAPNapbhyc/W0HXELGZv2BPp0MJOSUFExFenQgne7NeCV29N4PiJU9z82gIGvrOIbT8djnRoYRMT\n6QBERHITM+PSBpVoV7c8r87axOjpG5m+bjf3dKrDgA51KFo4OtIhhpTOFERE0hFbKJrBF9flmz90\n5LIGlRg+dQOdnpnO2O+2ciIfX29QUhARyUTV0kUZ1bcZ4we2pnqZYjw8YQVXDP+WL1ftzJddZigp\niIgEoUV8WT4a2JqXb2kOwIB3FnHdi3P5bnP+6oVVSUFEJEhmxhUNK/Pl0A78+9oL+WHfEW54eR53\nvbWQ9bvyR39KGnlNROQsHTk5H2wSAAAOpUlEQVR+kjfmbubFGRv5+dgJrmtWnWGX1aNq6aKRDu13\ngh15TUlBRCSb9v58nBdmJPHW3O/BoF+beO7pVIfSxQpHOrRUSgoiImGWvPcwz329gQlLkilZJIZB\nnc/l9jbxxBaKfDNWJQURkQhZu/MAT09Zx7S1u6kSF8uwS+txXfPqREdZxGIKNimE9EKzmXUxs3Vm\nlmRmD6Wz/gEzW21my83sGzOrGcp4RETCoX7lUrx+ewvG9W9FpVKx/Ol/y+ky/Fs+W74j13fTHbKk\nYGbRwGigK9AA6GNmDdIUWwIkOOcaAR8BT4cqHhGRcGtVuxwTB7XhpZubcco57n1/MVcM/5ZPlv7A\nyVyaHEJ5pnARkOSc2+ScOw6MA3oEFnDOTXfOne5UZD5QPYTxiIiEnZnR5YIqfDWsI8/3aUqUGfeP\nW8qlz85kfOK2XNcbayiTQjVgW8B8sr8sI3cCX6S3wsz6m1mimSWmpKTkYIgiIuERHWVc3bgqX9zf\nnpdubkaxwtH88aPldH5mBu8v2MqxEycjHSIQ2qSQ3hWVdM+XzOxmIAH4T3rrnXNjnHMJzrmEChUq\n5GCIIiLhFRXlnTlMHtKO125LoFyJIvxl4go6/WcGb83dwtFfIpscQpkUkoEaAfPVge1pC5nZpcBf\nge7OuYI95JGIFBhmxiXnV+LjQW14+46LqF6mKI9OWkWHp6fz6qxNHDkemeQQsiapZhYDrAcuAX4A\nFgJ9nXOrAso0xbvA3MU5tyGY7apJqojkR8455m/6iZHfbGDeph8pV7wwd7WvzS2ta1KiSPZHOcgV\n9ymYWTdgOBANvO6c+4eZPQ4kOucmmdlU4EJgh/+Urc657pltU0lBRPK7xC0/MXJaEt+uT6F0sULc\n0bYWt7WJJ65oobPeZq5ICqGgpCAiBcXSbfsYNW0DU9fspmSRGJ685gJ6NMmsvU7Ggk0KGnlNRCSX\nalKjNK/e1oJV2/czaloS8eWKh3yfSgoiIrlcw6pxvHhz87DsS+MpiIhIKiUFERFJpaQgIiKplBRE\nRCSVkoKIiKRSUhARkVRKCiIikkpJQUREUuW5bi7MLAX4/iyfXh7Yk4Ph5DTFlz2KL/tye4yK7+zV\ndM5lOfZAnksK2WFmicH0/REpii97FF/25fYYFV/oqfpIRERSKSmIiEiqgpYUxkQ6gCwovuxRfNmX\n22NUfCFWoK4piIhI5gramYKIiGQi3yUFM7vezFaZ2SkzS0iz7mEzSzKzdWZ2RQbPr2VmC8xsg5l9\nYGaFQxjrB2a21H9sMbOlGZTbYmYr/HJhG3bOzB4zsx8CYuyWQbku/jFNMrOHwhjff8xsrZktN7OJ\nZlY6g3JhPX5ZHQ8zK+K/90n+Zy0+1DEF7LuGmU03szX+/8n96ZTpZGb7A973R8IVn7//TN8v84z0\nj99yM2sWxtjOCzguS83sgJkNTVMmoscv25xz+eoBnA+cB8wAEgKWNwCWAUWAWsBGIDqd538I9Pan\nXwLuCVPc/wUeyWDdFqB8BI7lY8CDWZSJ9o9lbaCwf4wbhCm+y4EYf/op4KlIH79gjgcwCHjJn+4N\nfBDG97QK0MyfLgmsTye+TsDkcH/egn2/gG7AF4ABrYAFEYozGtiJ1/4/1xy/7D7y3ZmCc26Nc25d\nOqt6AOOcc8ecc5uBJOCiwAJmZsDFwEf+oreAnqGMN2C/NwBjQ72vELgISHLObXLOHQfG4R3rkHPO\nfeWcO+HPzgeqh2O/WQjmePTA+2yB91m7xP8MhJxzbodzbrE/fRBYA5zdoL+R0wN423nmA6XNrEoE\n4rgE2OicO9ubaXOlfJcUMlEN2BYwn8zv/xnKAfsCvmjSKxMK7YFdzrkNGax3wFdmtsjM+ochnkCD\n/VP0182sTDrrgzmu4XAH3q/H9ITz+AVzPFLL+J+1/XifvbDyq62aAgvSWd3azJaZ2Rdm1jCsgWX9\nfuWWz1xvMv4hF8njly15coxmM5sKVE5n1V+dc59k9LR0lqVtehVMmTMSZKx9yPwsoa1zbruZVQS+\nNrO1zrlvsxNXMPEBLwJP4B2DJ/CquO5Iu4l0nptjTdqCOX5m9lfgBPBeBpsJ2fFLR0Q+Z2fKzEoA\n/wOGOucOpFm9GK9K5JB/HeljoG4Yw8vq/coNx68w0B14OJ3VkT5+2ZInk4Jz7tKzeFoyUCNgvjqw\nPU2ZPXinojH+L7j0ypyRrGI1sxjgWiDDUbmdc9v9v7vNbCJeFUWOfKkFeyzN7BVgcjqrgjmuZy2I\n43cbcBVwifMrdNPZRsiOXzqCOR6nyyT7738c8FOI4vkdMyuElxDec85NSLs+MEk45z43sxfMrLxz\nLix9+gTxfoX0MxekrsBi59yutCsiffyyqyBVH00CevstP2rhZe7vAgv4XyrTgV7+otuAjM48csql\nwFrnXHJ6K82suJmVPD2Nd3F1ZYhjOr3vwHraazLY70KgrnmttgrjnVJPClN8XYA/A92dc4czKBPu\n4xfM8ZiE99kC77M2LaOEltP8axevAWucc89mUKby6WscZnYR3vfEj2GKL5j3axJwq98KqRWw3zm3\nIxzxBcjw7D6Sxy9HRPpKd04/8L68koFjwC7gy4B1f8VrGbIO6Bqw/HOgqj9dGy9ZJAHjgSIhjvdN\nYGCaZVWBzwPiWeY/VuFVm4TrWL4DrACW4/0jVkkbnz/fDa8Vy8Ywx5eEV7e81H+8lDa+SBy/9I4H\n8Dhe8gKI9T9bSf5nrXYYj1k7vKqW5QHHrRsw8PTnEBjsH6tleBfw24QxvnTfrzTxGTDaP74rCGhl\nGKYYi+F9yccFLMsVxy8nHrqjWUREUhWk6iMREcmCkoKIiKRSUhARkVRKCiIikkpJQUREUikpiIhI\nKiWFAsjM/up3m7zc79q3ZQ5t91AGy0/6+1nl9wfzgJlF+esSzGykP13EzKb6ZW80s/b+c5aaWdGc\niDEUzGy4mXUwr/vupX6XzoFdJ7cJUxz3m9lGM3OWQTfimTz3kYB4TwZM3xuqeNPs/2HzuvNeZmZf\nmVk1f3k1MwvLzZDi0X0KBYyZtQaeBTo5546ZWXmgsPO7Fsjmtg8550pkttzvz+Z9YI5z7tE05Vrh\ndX/d0Z9/Ca9b5DeC3L/hfaZPZfOlBM3MyuLdKNcqYFknvC7HrwpXHP5+m+J1lzEHuMA5t+8sthED\n7HHOnVFSyS4zuxiY65w7ambDgCbOudv8dWOBZ5xzi8IZU0GlM4WCpwreP/0xAOfcntMJwcyam9lM\n83qn/PJ0NxdmVsfMpvjLZ5lZfX95LTObZ2YLzeyJYHbunNsN9MfrfdXMG5Bksp8s3gWa+L9QB+B1\nJ/6Imb3n7++P/r6Wm9nf/WXx/i/MF/A6IqthZpf7cS02s/Hmdf52evCWv/vLVwS8jhJm9oa/bLmZ\nXecvT3c7afQCpmT1us2sRcCx/cLMKvnLB/qvaZm/j6L+8nfNbLR5A+Js9M9E3jJvUKHXMji2S1wI\nunE2r9uGj/04F5jXdQNm1tY/PkvMbLaZ1Ql4TR+Z2WdmttnM7jazhwLKxaUT+zTn3FF/Nm036B8D\nN+X065IMRPqWaj3C+wBK4HVtsB54AejoLy8EzAUq+PM3Aq/7098Adf3plnh99YDfB40/fS9wKIN9\n/m45sBeoRMCAJKQZnASvC5Be/vTleIOiG96PmclAByAeOAW08suVx+s8rbg//2f8wYvwBm8Z4k8P\nAl71p58Chgfst0xm20nzOt4Crk6zLO3rKOIf2/L+/E3AGH+6XEC5f+MP6oSXIN/1p6/D6167gf/a\nl+KdCWT0HicDpc/y8xGD13184LL/AS386drAcn86Dn+gKrxOCd/zpwcCq/G6g6gKHARu99e9SJpu\nXdKJ4VUCBncC6gALI/2/U1AeebKXVDl7zuvOtzneGA6dgQ/MGzIyEbgAr6ti8EaV2uH/Om4DjLdf\nx4Ep4v9ti/eFBV4/SU+dQShnOqjM5f5jiT9fAq9Tw63A984bbAW8kbgaAHP8eAsD8wK2c7pX0EV4\nvdOC1ylh79MFnHN7zeyqLLZzWhUgJYvYzwcaAlMDju3pDhAbmdnjQGm8kdACe6L91P+7AtjunFsN\nYGar8ZJhWDpGxBtMpk7A+1/OvM7+ygLvmFltvPfzl4DnfOO8TgoPm9kRfvta4jPakZndCdQD7glY\nvBsvuUgYKCkUQM65k3jDlc4wsxV4PXYuAlY551oHljWzUni/HJtktLkz3b//JXIS75/9/GCfBvzL\nOfdymm3FAz+nKfe1c65PBts55v89ya+ffyP9MQ8y285pR/A6uMuM4f26bp/OurfxOmdcaWZ34SW1\ntLGeCpg+PX9W/7tm9jbQCNjqnOseRPnTmSDB/Tr41Ol1/8I7IxrjV8V9nE7saePPMHYzuxIYBnRw\nzgUmmFi84yxhoGsKBYx5A48HDvjRBPger+fYCuZdiMbMCplZQ+f1Db/ZzK73l5uZNfafO4dff2EH\nVedrZhXwxr4e5fy6gSB9CdwRcH2gmn8dIq35QFszO9cvV8zM6mWx7a/werY8HWOZM9jOGuDcLLa/\nGqgWUBdf2H4djas4sNO8MQ76ZrGdbHPO3eqcaxJMQvDLO2AaAb/czez0D4Q44Ad/+vbsxOUfmxHA\nVc65tGNL1CN8Z0UFnpJCwVMCeMvMVpvZcrwqksecN55wL+ApM1uGV299uinlTcCd/vJV/Drm8P3A\nvWa2EO8LIiNFzW+SCkzF+xL++5kE7Zz7Cq/V0jz/7OYjvOqWtOVS8L6gxvqvbz5QP4vNPwmUMbOV\n/mvsfAbb+QzvGkJmsR/DO7bP+ttfgndtBuARvO6zv8ZLHmfNvKa+yXgj1a0ys5ezek6Q7gE6+xfh\nV/Pr6Hv/Aoab2Ry8M6/seBbvs/mx/1n5KGBdZ7zjLGGgJqki2WRms/F+4Z5xE1DJnF99NRvo4pw7\nGOl4CgIlBZFsMu/mvyPOueWRjiW/Ma9ZdHPnXHpDwUoIKCmIiEgqXVMQEZFUSgoiIpJKSUFERFIp\nKYiISColBRERSfX/cdjlE1FHkI4AAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1a18613908>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Vizualization\n",
"X = np.arange(-10, 10).reshape(-1, 1);\n",
"preds = clf.predict_proba(X)[:,1];\n",
"\n",
"plt.plot(X, preds);\n",
"plt.xlabel('Seed Difference (Team 1 - Team 2)');\n",
"plt.ylabel('P(Team 1)');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Average Ranking Based Logistic Regression Model"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"data_dir = './March Madness 2018/'\n",
"df_massey = pd.read_csv(data_dir + 'MasseyOrdinals.csv')"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>RankingDayNum</th>\n",
" <th>SystemName</th>\n",
" <th>TeamID</th>\n",
" <th>OrdinalRank</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>35</td>\n",
" <td>SEL</td>\n",
" <td>1102</td>\n",
" <td>159</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>35</td>\n",
" <td>SEL</td>\n",
" <td>1103</td>\n",
" <td>229</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>35</td>\n",
" <td>SEL</td>\n",
" <td>1104</td>\n",
" <td>12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>35</td>\n",
" <td>SEL</td>\n",
" <td>1105</td>\n",
" <td>314</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>35</td>\n",
" <td>SEL</td>\n",
" <td>1106</td>\n",
" <td>260</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season RankingDayNum SystemName TeamID OrdinalRank\n",
"0 2003 35 SEL 1102 159\n",
"1 2003 35 SEL 1103 229\n",
"2 2003 35 SEL 1104 12\n",
"3 2003 35 SEL 1105 314\n",
"4 2003 35 SEL 1106 260"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_massey.head()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Create composite final rankings\n",
"final_day = 133\n",
"df_final_rankings = df_massey.loc[df_massey['RankingDayNum'] == final_day]\n",
"df_final_rankings = df_final_rankings.groupby(['Season', 'TeamID'])['OrdinalRank'].mean()\n",
"df_final_rankings = df_final_rankings.reset_index()\n",
"df_final_rankings.rename(columns={'OrdinalRank':'Avg. Rank'}, inplace=True)"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>TeamID</th>\n",
" <th>Avg. Rank</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>1102</td>\n",
" <td>156.03125</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>1103</td>\n",
" <td>168.00000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>1104</td>\n",
" <td>38.03125</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>1105</td>\n",
" <td>308.96875</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>1106</td>\n",
" <td>262.68750</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season TeamID Avg. Rank\n",
"0 2003 1102 156.03125\n",
"1 2003 1103 168.00000\n",
"2 2003 1104 38.03125\n",
"3 2003 1105 308.96875\n",
"4 2003 1106 262.68750"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_final_rankings.head()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>WScore</th>\n",
" <th>LTeamID</th>\n",
" <th>LScore</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1116</td>\n",
" <td>63</td>\n",
" <td>1234</td>\n",
" <td>54</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1120</td>\n",
" <td>59</td>\n",
" <td>1345</td>\n",
" <td>58</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1207</td>\n",
" <td>68</td>\n",
" <td>1250</td>\n",
" <td>43</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1229</td>\n",
" <td>58</td>\n",
" <td>1425</td>\n",
" <td>55</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1242</td>\n",
" <td>49</td>\n",
" <td>1325</td>\n",
" <td>38</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID WScore LTeamID LScore\n",
"0 1985 136 1116 63 1234 54\n",
"1 1985 136 1120 59 1345 58\n",
"2 1985 136 1207 68 1250 43\n",
"3 1985 136 1229 58 1425 55\n",
"4 1985 136 1242 49 1325 38"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_dir = './March Madness 2018/DataFiles/'\n",
"df_tour = pd.read_csv(data_dir + 'NCAATourneyCompactResults.csv')\n",
"df_tour.drop(labels=['WLoc', 'NumOT'], inplace=True, axis=1)\n",
"df_tour.head()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>WAvgRank</th>\n",
" <th>LAvgRank</th>\n",
" <th>RankDiff</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>134</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>240.343750</td>\n",
" <td>239.281250</td>\n",
" <td>1.062500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1112</td>\n",
" <td>1436</td>\n",
" <td>2.676471</td>\n",
" <td>153.125000</td>\n",
" <td>-150.448529</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1113</td>\n",
" <td>1272</td>\n",
" <td>36.000000</td>\n",
" <td>21.705882</td>\n",
" <td>14.294118</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1141</td>\n",
" <td>1166</td>\n",
" <td>45.687500</td>\n",
" <td>20.735294</td>\n",
" <td>24.952206</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1143</td>\n",
" <td>1301</td>\n",
" <td>36.406250</td>\n",
" <td>50.312500</td>\n",
" <td>-13.906250</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID WAvgRank LAvgRank RankDiff\n",
"0 2003 134 1421 1411 240.343750 239.281250 1.062500\n",
"1 2003 136 1112 1436 2.676471 153.125000 -150.448529\n",
"2 2003 136 1113 1272 36.000000 21.705882 14.294118\n",
"3 2003 136 1141 1166 45.687500 20.735294 24.952206\n",
"4 2003 136 1143 1301 36.406250 50.312500 -13.906250"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Join tournament results\n",
"df_win_ranks = df_final_rankings.rename(columns={'TeamID':'WTeamID', 'Avg. Rank':'WAvgRank'})\n",
"df_loss_ranks = df_final_rankings.rename(columns={'TeamID':'LTeamID', 'Avg. Rank':'LAvgRank'})\n",
"df_dummy = pd.merge(left=df_tour, right=df_win_ranks, how='left', on=['Season', 'WTeamID'])\n",
"df_concat = pd.merge(left=df_dummy, right=df_loss_ranks, on=['Season', 'LTeamID'])\n",
"df_concat['ScoreDiff'] = df_concat['WScore'] - df_concat['LScore']\n",
"df_concat['RankDiff'] = df_concat['WAvgRank'] - df_concat['LAvgRank']\n",
"df_total = df_concat[['Season', 'DayNum', 'WTeamID','LTeamID', 'WAvgRank', 'LAvgRank', 'RankDiff']]\n",
"df_total.head()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD8CAYAAABn919SAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4wLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvpW3flQAAIABJREFUeJztnX+MHVeV57+nn5/xawfcHXAgaeKx\nByFnYLPGQ4sEWVpNMkPM8rMHYiDazHp3WayVdlcbQL04moiQVSTMtnaYmZ3VrqIBrVEY4ySETpgw\na1gSNFq0NtOmbYyHWISQxHnOkszE7SHxS/K6++wf/apTXe/eqlu/69X7fiTL/epV1T33VvXpW986\n9xxRVRBCCBl8Rso2gBBCSDbQoRNCSE2gQyeEkJpAh04IITWBDp0QQmoCHTohhNQEOnRCCKkJdOiE\nEFIT6NAJIaQmrCuysTe84Q26devWIpskhJCB5/jx43+nqpuj9ivUoW/duhVzc3NFNkkIIQOPiDzp\nsh8lF0IIqQl06IQQUhPo0AkhpCbQoRNCSE2gQyeEkJpQaJQLMTM738bMkTM4t9DBFWMtTO/ejqmd\nE2WbRQgZMOjQS2Z2vo1b7z+FTncJANBe6ODW+08BAJ06ISQWlFxKZubImVVn7tHpLmHmyJmSLCKE\nDCp06CVzbqETazshhNigQy+ZK8ZasbYTQogNOvSSmd69Ha1mY822VrOB6d3bS7KIEDKo8KVoyXgv\nPhnlQghJCx16BZjaOUEHTghJDSUXQgipCXTohBBSE+jQCSGkJtChE0JITaBDJ4SQmkCHTgghNYEO\nnRBCagIdOiGE1AQ6dEIIqQl06IQQUhPo0AkhpCbQoRNCSE2gQyeEkJrglG1RRJ4A8GsASwAWVXVS\nRC4FcBjAVgBPAPiYqp7Px0xCCCFRxJmhX6eq71DVyd7n/QC+r6pvBfD93mdCCCElkUZy+TCAg72f\nDwKYSm8OIYSQpLg6dAXwXRE5LiL7etveqKrPAEDv/8vyMJAQQogbrhWLdqnqORG5DMD3RORR1wZ6\nfwD2AcCWLVsSmEgIIcQFpxm6qp7r/f8sgG8BeBeAX4nI5QDQ+/9Zy7F3qeqkqk5u3rw5G6sJIYT0\nEenQRWSjiLzW+xnADQB+CuBBAHt7u+0F8EBeRhJCCInGRXJ5I4BviYi3/1+o6v8Skb8BcI+IfBLA\nUwD25GcmIYSQKCIduqo+DmCHYfvfA/jdPIwihBASH64UJYSQmkCHTgghNYEOnRBCaoJrHDohhTM7\n38bMkTM4t9DBFWMtTO/ejqmdE2WbRUhloUMnlWR2vo1b7z+FTncJANBe6ODW+08BAJ06IRYouZBK\nMnPkzKoz9+h0lzBz5ExJFhFSfThDJ5Xk3EIn1vaqQbmIlAFn6KSSXDHWirW9SnhyUXuhA8WrctHs\nfLts00jNoUMnlWR693a0mo0121rNBqZ3by/JIncoF5GyoORCKoknTwyibDHochEZXOjQSWWZ2jkx\nEA48yBVjLbQNznsQ5CIy2FByISRjBlkuIoMNZ+iEZMwgy0VksKFDJyQHspCLGPpI4kKHTkgF4UpZ\nkgRq6IRUEIY+kiTQoRNSQRj6SJJAySUnBkn/HCRbhwWGPpIkcIaeA4O09HuQbB0mGPpIkkCHngOD\npH8Okq3DxNTOCXzxI1djYqwFATAx1sIXP3I1n5xIKJRccmCQ9M9BsnXYGNSVsqQ86NBzoCr6p4s2\nXhVbCSHpoeSSA1XQP1218SrYSgjJBjr0HKiC/umqjVfBVkJINlRechnUkLqy9U+bBm6SV8q2lRCS\nDZWeoTOkLjk2DVwAjh8hNaXSDp0hdcmZ3r0dYtiuAMePkJri7NBFpCEi8yLyl73P20TkmIj8XEQO\ni8j6rI1jSF1ypnZOQC3fcfwIqSdxZuj/AcDPfJ+/BODLqvpWAOcBfDJLw4DBLhRcBSYcx292vo1d\nBx7Gtv0PYdeBh2NLMmmPJ4Rkg5NDF5E3A3g/gD/vfRYA1wO4r7fLQQBTWRvHkLp0uIxf2vcUfM9B\nSHVwnaH/MYD/CGC59/n1ABZUdbH3+WkAmYdJMKQuHS7jl/Y9Bd9zEFIdIsMWReQDAJ5V1eMi8jve\nZsOuRslWRPYB2AcAW7ZsiW1gWEjdoIY0psGlz3HGJe17imF6z5HmfhvGe5UUj0sc+i4AHxKR9wHY\nAOB1WJmxj4nIut4s/c0AzpkOVtW7ANwFAJOTk7b3dLEZxoouLn2OOy5pl/4PS+qANPfbMN6rpBwi\nJRdVvVVV36yqWwF8AsDDqvrPADwC4MbebnsBPJCblQaG8VHfpc9xxyXte4phec+R5n4bxnuVlEOa\nOPTPAfiMiDyGFU39K9mY5MYwPep7uPQ57rikfU8xLO850txvw3ivknKItfRfVX8A4Ae9nx8H8K7s\nTXKjzo/6Nr3Vpc9JxiXt0v9hSB2Q5n6r871KqkWlV4qGUddH/bAwQJc+13VcyibNuPKakKKofHIu\nG96MsG6RA2F66w/3X7+6j63PdR2XskkzrrwmpChENbPAk0gmJyd1bm6usPYGkW37HzLGfwqAXx54\nf9HmEEIqgIgcV9XJqP0GVnKpK0x3QAhJCh16xaDeSghJysBq6HWFeishJCl06DmRZqn3MIQBEkKy\nhw49B7jUmxBSBtTQc4BLvQkhZUCHngNc6k0IKYPaSi5lpivlUu9qw1S2pK7UcoZedhUdhh5Wl7Lv\nDULypJYOvWwNe1gyEA4iZd8bhORJLSWXKmjYgxZ6aJMh6iZPVOHeICQvaunQqWHHwxZmOffk8/jm\n8Xatwi95b5A6U0vJhRp2PGwyxKFjZ2snT/DeIHWmljN0Lp+Ph01uWLJk4hxkeYL3BqkztXTowOBp\n2EUQtxJSQ8To1Addnij63oj7HqJu7y1IcdRSciH9JKmEdNM1V1KeSEncMEmGVZI00KEPCWHherYw\nyzunrmb4ZUrihkkyrJKkobaSC1lLVLieTYagdJWOuGGSDKskaaBDHxKqGq5XN73Y6097oYOGiLGc\nIGAe99n5NkZq+t6CFAMllyGhiuF6ddOL/f0B7FFCpnH3jjUdU/Z1IoMDHfqQUMV0BHXTi039CWIb\nd9uxDZHSrxMZHCi5OFIHacCvh3v9+fThE6X1x1UvznLs8zyXSdLyIwB+uP9643dhawHKvk5kcKBD\nd6BuFYiq0h8XXT9LW/M+lwBWzRwI18FtYyG9c6e1lwwHlFwcGAZpoIz+uOj6Wdqa97kUKw7YRJQO\nbhoL0x+IQb7vSP5EOnQR2SAiPxKRkyJyWkTu6G3fJiLHROTnInJYRNbnb2451C2UrCr9cdH1s7S1\niHMpVvoBrOjfgNv7CtNY2Gb7g3rfkfxxkVxeBnC9qr4gIk0A/0dE/grAZwB8WVW/ISL/A8AnAfz3\nHG0tjaqG/CUly/5ksazdpitnbWuacwXt3tRqYqHT7dtvYqxl7M/sfBu7DjwcOk7eOw6vrbB+uFCH\n9z4kHpEzdF3hhd7HZu+fArgewH297QcBTOViYQWoYshfGrLqTxHL2rMc+6TnMtn94iuLaI6sFVhs\n54rT72DoYxDXvtctJJS44aShi0hDRE4AeBbA9wD8AsCCqi72dnkaQG3/9Fcx5C8NWfWniGXtWY59\n0nOZ7O4uKS7ZsM7pXHH6HRb6GKfvVXlPQorFKcpFVZcAvENExgB8C8BvmXYzHSsi+wDsA4AtW7Yk\nNLMc0j6yFvHIm0TyuOPbp3H+4opcMNZqJrYry2XtYf2Ik34gajySpDKw2b1wsYv5z9+Q+HjTdtu+\nYSGPadsk9SFWlIuqLgD4AYBrAYyJiPcH4c0AzlmOuUtVJ1V1cvPmzWlsLZS0j6xFPPImkTym7zu5\n6swBYKHTxfS9JxPZZdNy427f1GpmMlZ5jXnc/gTZ1Go6b0/bVtbnIYOFS5TL5t7MHCLSAvB7AH4G\n4BEAN/Z22wvggbyMLIO0j6xFPPImkTy6S/0PUt1lTWRXXE3atr8IMhmrvMY8rY4vllhG0/as3hnU\n7b0PccNFcrkcwEERaWDlD8A9qvqXIvK3AL4hIncCmAfwlRztLJy0j6xFPPJmmbEviV1xq//Y9v/0\n4ROZ2JTXmKetcrRwsT8axrY9q4pKrMw0nEQ6dFX9CYCdhu2PA3hXHkZVgbThcq7Hp9HZ49oYtjx9\nRASz821j27PzbXzhwdOrYXrjo03c/sG3r+rRcZyEaX8vO6FrP2zYQgmzkBlc+2m6nnGvU1Ypi5n6\nePjgSlELaR9ZXY5Pq/kmkTyaDfPz/5Kqse3Z+Tam7z25xlGev9jF9H3JdHebXWnlgdn5Nl58ZbFv\ne3NECpMZbNfzuqs2U/4ghUCHbiFtuJzL8Wk137g2Tu2cwMyNOzA+an5JZ2p75sgZdJcNuvtSMt3d\nZlfa0ETb+4FLNqwrbJZqu56PPPpcrcJeSXURteRszoPJyUmdm5srrL0qMzvfxi0W7VgA/PLA+3Nt\nf9v+h4xxpsG2bfv590+qz2YZ1hlm50RBGRpdx5SQuIjIcVWdjNqPM/QS8B7NbRQRWuYa1hZlS9Lw\nwKxDDMPsTHPuOHYyVJCUDR16CYStBixKW3XVrV1tiRsemHWIoak/WZw7jp0MFSRlQ4deAmFhdEVp\nq666dRxb4oQHZh1i6O9P3DbDiGNn3VJEkMGDBS4yxq+3jo02oQpc6HTXaK+2MLaJsVZuv/w2Hdil\nvQmHajxAPGkhjwyWXn92HXjYqXCGiy4eZmewIPSS6hq93lYVyt/2plYTIisx6UE7mC2RxIUvRTMk\nWMUmSKvZwBc/cjUA9O3nfZfHL6zJrjjtRfUr7vmysCnNueO0bdv3o++cwDePt41jYvs+6ji/HUCx\n9wipNq4vRenQM8Q2M/Tj5csucvZls8uWu9tE0N7rrtqMRx59LpX9eY5B2LnjjofpXLbFUB7ejN11\ne9AOAKmvGakPrg6dkkuGuGi03j5FruLLQq/Ow948xyDs3HHHw3QuW7oCD5vTjnLmYXZEfUcIHXqG\nuFR+LyOErayKS0lm4EU8udjGI5j9MMyWqGudZoauIfulvWbU5esNo1wyJCp0rqwQtjLC6ZLEmRdV\nZee6q8xpnH/98uJqW1G2hF3rVrOBm6650jjmpu0mTM487TVjFaP6Q4eeIcGwtfHRJsZazdJD2MoI\np0sSZ15UlZ1HHn3OuH3Jl0Y4ypZgmGSwIPSdU1cbxzy4fazVtKZi8M6b1TVjFaP6Q8klY6qU4S6v\nx2uX87ro1MHz2CSMKN34ttlTOHTsLJZU0RDBTddciTunrrbaHSaVeG3Fqa70xx9/h3FcbfeCabst\nbcCyqjFtgEt4rK1vrtvJ4EGHXlOC4Xbe4zUQb7FQ0vNG6fam8wjMdQzDdOPbZk/h7qNPrX5eUl39\n7HfqLqGX/rbCdPY8xjXOe45gX/wVqMLsKetdCikOSi41Ja/Ha9fzRun2pvMoVhJZ2Y4xcejYWaft\nYekWPJqNV1Pt5l1dKUic9xxRfWFqguGFDr2m5PV47XreKN3edh7t7euqG7uGB0b1e3y0iZkbd6wp\nTG2y31Z9KIuqSK7vOeKExyZtgwwmtZNcGJa1govkkWSc4jy2h71PCEt/4LJwxrPfhsjKAiKvf2Oj\nzTXShEt7WVRX8o9zqzmCzuIyVGHU+v1l49oLHXz2npO45fCJvvS/acJjq/SOh2RPrWboDMt6lbDH\n6zTjZAv5s21PYl8UfvttCLCmfy+8tNhXrSnv4svBcb7YXXHmwKta/22zp4z7e/sA8UImPeJeD1IP\nauXQGZb1KmGP12nGyRbyZ9uexL4owjTkhghGmyMIFlnqLis2rl+XWm6IY7eLbu/X+sP2N4VMeqGS\nJuJeD1IPaiW5MCxrLbbH6zTjZNunbQjnC8o4pu/D5BXT/l5bJgTAL774Pmzb/5Dx+wudLk7cfkOs\n9kzJuvwZFtsLnVVHO/fk82vCJ12W+S+prhbnjpJRvDH2rmtY+gHbdaIkWW9q5dAZluVGmnEK02+n\n7z25Wn80GD4XN4zStP/0fSfNcY0B+5P0z8W+4D5+SeQzh09g2Xc+F2fucev9pzD35PPWsM3gvp5N\nYdfCJdwxq5BLUh1qJbkwLMuNNOM0vXt7X2ihR7CYtF8miCvzmPbvLqmxYHXQ/iT9c7EvTBJZNm51\no9NdwqFjZyOdedCm6d3b0Rzpvxr+8Es/lCTrT60cOsOy3EgzTlM7J5wcj4fLyss422347U/SPxc7\nkkp3nh2jTfuvW5wZvT9j58yeHRjzJRULhl+ajnPdTgaPWkkuQPKwrEHTFqOWu8/Ot3HHt0+vhuqN\ntZr4wofevsbpJe2fawUjILkM4hKa57cnSquP6quLfXFsCvLlXnoAWy52F7nFZFOc61iEJDlov0d1\no1Yz9KQMWrijt9zdm9UFQ+Bm59uYvu/kmrjrhU4X0/eezKRPJkmj2ZC+x/80MkiSNoDk19LFvrBw\nwbBfJL8NNpnE5Mxt0lbSkMS8JclB+z2qI3ToGDxtMWq5+8yRM+gu9buIri+bYBpMksbMjTsws2eH\nVeaIK4MkacPre5Jr6WJfWIbFP/r4O3DztVusoYSeDVM7J3DJhnQPxklDEvOWJAft96iORN5ZInIl\ngK8BeBNW3v3cpap/IiKXAjgMYCuAJwB8TFXP52dqfgyathi13L2IijdhmQTjHpNlG2mupYt9YftM\n7ZzAnVNXWzMnejbY0gcEsUkwaa5hnitFB+33qI64TBUWAXxWVX8sIq8FcFxEvgfgXwD4vqoeEJH9\nAPYD+Fx+puanzxUZ7phFH8L01m37H8JISAy0S59cbXRNW1sktms5NmquRuTFky+p9i2xtxF8PzHa\nHIEC6HTDY10UwNb9DznHqIed5y23fmeNzQCcY+jz0rcZNlw+kZKLqj6jqj/u/fxrAD8DMAHgwwAO\n9nY7CGAqLyOBfPW5osIds+jD7HwbIwYN1kNhn8E3R8zhbElsjNLxy2J69/a+Jf4A8MJL5mpEgH2J\nvQnT+4mL3eVIZ+4njTMPnqO90MH0vScxfd/J0GtWhL7NsOHyiaWhi8hWADsBHAPwRlV9Blhx+gAu\ny9o4P3nqc0WFO2bRh5kjZ7BkicUO4vf7Y60mZvaYw9mS2OiatrZopnZOYOP6/gfPbkQ1Ig+Xqkqm\n9xNl0l3WPptcYuiz1rcZNlw+zm9nROQSAN8EcIuq/oOE5JEIHLcPwD4A2LJlSxIbAeSvzxWRhS6L\nPsQJm1MFnghUu4l67Hat1GNzaUuq2HXgYadl/8CrMsGmVhMiK8Ua4kogQS50wlPcRo13sK+ebQsX\nu7Fi8MumvdBZvRZhKRuyhNkcy8Vphi4iTaw486+r6v29zb8Skct7318O4FnTsap6l6pOqurk5s3J\nM8CFxSsPCmn7MDvftoayuZzX5bHbZsvYaHPNsWEEz2tqNygTLHS6qzJGHAnEpd/B7VHjHeyrZ9sg\nOXMPbwyD7xA8BGBYYY2IdOiyMhX/CoCfqeof+b56EMDe3s97ATyQvXmvUgd9Lm0fZo6csTqVqPhs\n7/iox26bjar9lXrCiFr2b5IJos7jStQ4h8WTJ+lr1el0l6BqjmtXgGGFNcJlhr4LwB8AuF5ETvT+\nvQ/AAQDvEZGfA3hP73Nu1EGfS9uHMKkgKj477Hj/dpuNNhnDxd68qiTZiBrnsHjypH2tOhc69icM\nhhXWB9EM3ri7Mjk5qXNzc4W1Vzdsy8Zdq/zEOT6oeb/48iIWDI7Oc4rG5ewCZHF7NUSwrLpG8w97\nF2D7zr99bLQJ1RVHFzzeNk5lMz7axD90FhNFyYRdJ//1TzKufrj0Px9E5LiqTkbtV7tcLnVmevf2\nvsr1cSQb1+NNaVa9Zff+bIf+Y4PnBcKdeXNEAIGT7BLU1OeefB7fPN42poEN2mI7xh92GEwjaxqn\nKnCh0+0r3OHK1te3sGdyS+j1D0uvC5jHFbCnF2Z63uLhDH3ASDsDcjneNkMdH21idP26yNlb2MIm\nAMbFMKYoF9sCHNv2sFmoy2Ie20zVH+VyxVgLW1/fwtHHz685X0MEr1knuBgjHj0rxkebeLm7FNp2\nQwS/+OL7Qq9/2BMcED27T/sESexwhl4B8nj8NIWFxWnHJazMpqkuXOzi9g++fbUtf0k079/sfBu3\nhFTSAYD/d+El3HL4ROTq0q2WykM2xxymBbvIFCZn5EW5eJx/8WVcd9VmnD7369Xt46NNvP8fX467\njz4V2UbWCID5z99grdLksaS6Op5jreZq9kc/YaGNtugql/TC1OiLg8m5cqKozHN5tOMaumgLT4zC\nZXVpEvuvGGtZbXdcNoHbZk+FFqG+2F3G3UefWuvkL3ZLceaAeyimH1vmTevYoT91gumYOoQWDzp0\n6DlRVOa5PNqJE7oYFZ7ogml1aVz7PS3YZntrndutfujY2cT9yJtgSoNgKKYp5YENU+ZNWzUqxcr7\nkCTphQcttHjQoUPPiaIeP/NoJ27ookt4or+qThCTHBLHfn9Yos1211wrS6qVlQhmbgxPTzxz4w6M\nW2bSJoL9DKtGdaHTjZVeeFBDiwcdaug5kVXmudn5Nr7w4OnVR/yN6xtoNkZWw+3GRptrIjaSthPE\nr7V7Gr3tlz2qKtH4aBPzn79hNUOgiW37H1qj/7tWB2qIWEMZ/Tqxl1nR5Xxv2rShkmGLtxw+gfHR\nFf0bAO749unV9xVeRar5z9+wuv/sfBufvedkrMybtmpUV/SqQqVJL0zyhzP0nMji8XN2vo3pe0+u\n0WtffGUJC71FIu2FDl54aTH0UTwtYXpysK2oTIc3XXOltZ2gJm+r7BNkSRW33n8Kt82eCtX3w1aH\n+rnpmisTVwQqgvMXu/jMPSfw2XvDK1J51y1u5k3KJoMNHXpOZPH4OXPkjLXKvUd3WbFx/bpCq9B4\nmB77wzId3jl1dWhVH2BtZZ9gAeTR5ojx5Wanu4RDx86G6vu21aEeDRHcfO0W3Dl1dWhFoNHmCG6+\ndkuohJQ3ywpjxk3XjJJhmTcpmww2jEOvMLbKN0EEwC8DWRXztsHWZpz945477BgbScbG1S5bWGWZ\neDYmGVtSXRiHPsBEadZBTFkVs4h/n51vWxcJjYj06d5h+29qNbHrwMNrbIr7niHs/DbivEuIGnev\nUpAXO5+28lAeeDa2miPGhUbeeESlM3a9b9Lca0wTkD2coVeM4PLpKFrNxppHYtPxwX2ytqPVbOCj\n75xYs7Tej2mZv+0Ym61xx8Vj11suxdc/9e7I/eKe/+ZrV3L7lxV/PiKITAMQ3McbW6A/VUOzIYCi\nL7VD2H2T5l7L6j4dFlxn6NTQK0ZUDPTG9Q2MtZqZV72Pa0fw/CYNG1jRpi/ZsM5YUeeRR59z1muT\nxoYffdytbnnc8x86dtbpnUBevG5DMzJEUQHj2BrTGS9p3/sal+pNSe+1otZpDBuUXCqGLQbaVfvM\nKi497v426WFZ1Vrlvt1LH+DyyJ00NtxVEknS310HHsZ1V23G61rrjKGjebLQ6Ub+IVFFX0jnpw+f\niPUOIix8M00VpLT3KeUaM5yhV4y0y6ezWn4dd3+bcwldjg84pyxIGlfvOntOcv72Qgd3H32qcGfu\n4fLHavrek30hnXEIq2gUdl2jUjekuU+LSqsxiNChV4y0ccBZxRHbloED/ZVvWs0GbrrmSmu7JpsE\n6HMuYY/crnHkQcJi37M4f9XpLqtVDnMhrKJRWKqAKOkkzX1KucYOHXrFSBsHnFUccdgycJM2e+fU\n1dZ2TTbFrZ4TPMf4aHM1FtybhfudiwCrceWu/fWfv06EzeRd+hp2TZJWQUpznzKrox1q6BUk7vJp\nk56YRf7pcUtaAZcKR0FN078E/9xCxxryZwrBvOPbp1ft8FK/eue60OniTZs2GMPuJn/j0lg2elQr\nEDE/vOsYVaEpTAYJSxUQRdI0AVml1agjdOgDTl5VYmbn23jhpcW+7c1G/5JxFxuC+5icefCRe3a+\njen7Tq6JkFnorCx9b8ir1ZPaCx1M33tyTWhk0IYoG5OGRQ4qwZQNwXH2E5YKIW0VrSSU0eagQMll\nwMlLT7SlHdi4fl3fHwoXG2xhgQ2R0BBMk5NZDsRLAyufTaGRYUvhs0j9O4i4pmzwCEuFUEaqAKYn\nsMMZegVwKWpskwjy0hNtx1/odPvKs5mKRwfPYTvfkupqSTrXvsWhvdAJzfJ4bqGD2fl2JbMr5oVp\nrG2pkQE3PbxoZ8qsjmbo0EvGJgWEFUL238h56Ym2825qNdfYa3PmQRvC0uHG7Vtcwl4Kjq5vOFVZ\nqhNxx5ra9OBAyaVkbFJAVPZAj7zSndrOK9JftciESzUbP7a+xanCE2dfj4uvLA2N1OJhHWtDumLT\nOxNSXejQSyZMinDZPy890XZe26rPIGHVbGyY+hanCo9X0ScOwxLREsQ41oF0xeOjTczcaE6zS6oJ\nk3OVhKdD2x5zbWF9Y60mNr5mXaiunuey6KgQN2DFEYyut9toO4cpHNK1ff+xYZp5kCpmTCwK27sL\nLquvHkzOVWFcqgCZVl42RwQvvrIYuuQ572XRLisqL3S6oe2nkYlcjnVdHWob52GhjPuH5Asdegm4\nVAEyrby0ZS2MCg/Mclm0i3QSjHYMtp9GJnI51iULYkPEOs5eatw6YRuJou8fki+RkouIfBXABwA8\nq6r/qLftUgCHAWwF8ASAj6lqZJ5SSi4rJK0m43JckZVq4lQPKqtSTtLxCKtGZMpDU3U8h16l+4e4\nk2XFov8J4M8AfM23bT+A76vqARHZ3/v8uSSGDiNJQw1djstzWXRQWw2LQQ8yZnixGUerjdo3+P3W\n17dw9PHzVsdrq7jknSOMQXPmACAhBTEUK+8mklSRcsXlWmeh3d82ewqHjp3FkioaIqvVpYaFSMlF\nVf8awPOBzR8GcLD380EAUxnbVWuSasgux+UVxmjSVl98ZbEv1K05ImgYwt9eeGkxsVYbta/p+x/+\n4vnQl51LqmvOlSbF7CAQVd3IG4frrtqc+f3jcq2z0O5vmz2Fu48+tXrdl1Rx99GncNvs8KwzSKqh\nv1FVnwGA3v+XZWdS/UmqIbscl1cYo63KzSUb1q1pa2bPDrz2Nf0Pfv6K9Lbz2bTaLJftm3T1sIpL\nw0TcKlKuJE0NEVe7P3TsbKz7xs0NAAALGUlEQVTtdST3laIisg/APgDYsqV+L5uSknTpcllLnm0y\nxMLFLm7/4NtXH5VnjpxJlQqg3VuK7+9j2L5hWrcJ26x9WEMXg5xb6GR+j7mkp8gihUXYtfUkpbqH\nXyadof9KRC4HgN7/z9p2VNW7VHVSVSc3b7ZnbSPZkFfYmU1D9VIB+NuzRVT4z7GpZV8sFLSXS8+L\nI4+xdqlOlEWlrbCopmEJv0zq0B8EsLf3814AD2RjDklLXmFncVIBKMxVjfw6bFhluKC9Va8mZFoy\nP4jklYK2qHc/UesPhiH8MtKhi8ghAP8XwHYReVpEPgngAID3iMjPAbyn95lUgLyyL8ZNBWCrOO8R\nlULAb2/R1YT88egubc7siZ9yoEzGWs01VZ/yTkFb1Lsfl/UHda9qxKX/NSPNsvok7PxP33WqahQM\nSbv4ymJkcWVvaTqwthLRuQIiUby25558fjUMzkZzBFhcHpxwRv+S/9n5Nr7w4OnV9x4b1zfQbIzg\nQqeLK8ZauO6qzXjk0edWxz74OU6q5yIp+vcgb1zj0OnQa4ap8k6r2chl9jU738b0vSf7ik00G7Im\nqZPJpuaIrKkwZMO0X2NEsGSIw9v1lkvx46curG2nIYChIIYLtnbqQKvZwEffOYHDPzqbaGyC5/Gn\neva2l1l0osjfgyJgLpchpchqLq5VjYwhj8uKjevXrUoVtsdkUyWipWXFxvWN1WMaIrj52i34+qfe\n3df3mRt3rEoiEtKOibo6c+DVUM00ztx/nqqlCxjWqkacodeYvB+DXZeJp93PRNyl6FHZLUn2pEkX\nUDUJp2w4Qx9yisia5xpqtqFpvs2C2+OEqMXZNyq7JUmH7aknaQgkMz4mhw69phSRNc811OzlxWXj\n8cHtpvM1R6SvElHccLakBaBNKQzIWmwpiNOEQDLjY3Lo0GtKXuGLflx1SptMG9xuOt/Mnh2rlYiS\naqFhfR4fbaLle1LwfPjEWAv/Zc+OWqbSdcEUumn6bEtBnEavLuLerSssEj2gRGmMeWZd9GNaJh60\nbcSS6c/0qG5bdh7lHMLGI6wA8uj6dbj9g2+3nn9q50Rk2GLdGB9tYv7zN/RtDxvjLNMF2K7X2GgT\nuw48TF09BM7QBxAXjTGvrItJbLO96XStLJSkTf942Aogw7CviWt/czwTOweF8xe7feNRpK5tlN4a\nghdeCq/WRejQBxIXjbGssC2TbcsAWs2RvjDDrPJUR43H1M4JXLLB/jAapc8+8ffD96gfHI8idW3T\nvbtx/bq+EEvq6v1QchlAXDXGMjIz2mx7qbucWcWb4KO/TU7x2+KSamB2vo07vn16zQrW8dFm5IrW\nOtJe6KyRN1zGOCk2Kcd/726zZNWkrr4WztAHkCwy0+VF3raZHv1dsjuaKib52dRqYvq+k33Oexid\nuUfcMU6Cq5RT5Xu+StChDyBl6eMu5G2b6dHfJbtj2DtNL2tkVBqCYcZljJPgKuVU+Z6vEnToA0iV\nlzXnbZvtETsqu+OFkNqnYVkjyatEjXES4siHVb3nqwQ19AGlrMpFLsSxLe4Sb5ueG5ZFb3a+jRER\na+jhLYdPrORn5wTdiS9//B2rGRb9OrstE2MYccJrq3zPVwXO0ElpJAmFi/vo7bURFUc+RGHmqbAV\n1W4vdHD30adihxVSSskWOnRSGklC4eI+eidd9k/suBbVdgkrpJSSLZRcSGkkXeId59GbYW354Lpy\n1mX8KaVkBx36kFBGOtK80hPE6UtYDDXJH4YVFgsllyGgjHSkeaUniNuXqheYrjPUwouHDn0IKCMd\naV7pCeL2JdhGnIpFJJqGSGgmRkopxULJZQgoIx1plukJ/BKLTbk1tec/blNrJU3uxa45NztJxrJq\nZikd6kjRUicd+hBQVCrdPNo0Ffu1tRd23ELIwiKSHGrkdoL3oCcPAtHpoJNCyWUIKCPWN6s2XcIO\npdde3ONIOqiRh1OG1MkZ+hDgzQaKfPTLqk0XWUjRP+NhuGL+UCMPpwypkw59SCgj1tfWZtZhhxOG\nx37XcMVGSEoAYmdirBXr3cfYaBOqKzl1wq65yzF56dJZn7cMqZOSCymUrMMObY/9LuGKzYYYCxyT\ncFykluB1Pn+xi4VON/SauxxjSjmQRQhuHqG9ZUiddOikUNKGHY61mhgfbUaGxpmOG/UVgx4fbWLm\nxh2rBY4ZzuhGQ8RJaol6h2G65i7HmFIOZKFL56F3l5HWIJXkIiLvBfAnABoA/lxVD2RiFaktSXTF\npHKR63FTOyfw6cMnYp9/GFlWdRpTF504uI/LMTZ5LK0unZfeXbTUmXiGLiINAP8NwD8F8DYAN4nI\n27IyjNSTqlaeKbv9QcF1nFz2C+7jcoztSSrt9avqfRmXNJLLuwA8pqqPq+orAL4B4MPZmEXqSlXT\npdY1RcCut1yaWb/iXKck7z5cjjG988ji/qnqfRmXNA59AsBZ3+ene9vWICL7RGROROaee+65FM2R\nOlDVdKkmu/xL2YMavIc3XzTtv3G92TnlodaPCPDWyzauzmAbIrj52i34+qfevdqvMBoi2PWWSzHW\nerX26sb1DYy1ot9XmAiO5/hoM/JcLsd47zyyvn+qel/GRTRhyJaI7AGwW1X/de/zHwB4l6r+e9sx\nk5OTOjc3l6g9QggZVkTkuKpORu2XZob+NIArfZ/fDOBcivMRQghJQRqH/jcA3ioi20RkPYBPAHgw\nG7MIIYTEJXHYoqouisi/A3AEK2GLX1XV05lZRgghJBap4tBV9TsAvpORLYQQQlLAlaKEEFITEke5\nJGpM5DkATxbWYPV4A4C/K9uIisMxCofjE05dx+c3VHVz1E6FOvRhR0TmXEKPhhmOUTgcn3CGfXwo\nuRBCSE2gQyeEkJpAh14sd5VtwADAMQqH4xPOUI8PNXRCCKkJnKETQkhNoEPPCRGZEZFHReQnIvIt\nERnzfXeriDwmImdEZLdv+3t72x4Tkf3lWF4MIrJHRE6LyLKITAa+G/rxCTLMffcjIl8VkWdF5Ke+\nbZeKyPdE5Oe9/8d720VE/rQ3Zj8Rkd8uz/KCUFX+y+EfgBsArOv9/CUAX+r9/DYAJwG8BsA2AL/A\nSuqERu/n3wSwvrfP28ruR47j81sAtgP4AYBJ33aOT/9YDW3fDWPxTwD8NoCf+rb9ZwD7ez/v9/2u\nvQ/AX2ElY/G1AI6VbX/e/zhDzwlV/a6qLvY+HsVKNkpgpQjIN1T1ZVX9JYDHsFIsZKgKhqjqz1TV\nVLCR49PPMPd9Dar61wCeD2z+MICDvZ8PApjybf+arnAUwJiIXF6MpeVAh14M/worMwXAXhjEqWDI\nEMDx6WeY++7CG1X1GQDo/X9Zb/vQjVuq5FzDjoj8bwBvMnz1h6r6QG+fPwSwCODr3mGG/RXmP64D\nHYLkMj6mwwzbajk+MbCNCQln6MaNDj0Fqvp7Yd+LyF4AHwDwu9oT9RBeGKRWBUOixsfC0IxPDFhM\nJpxficjlqvpMT1J5trd96MaNkktOiMh7AXwOwIdU9aLvqwcBfEJEXiMi2wC8FcCPwIIhHhyffoa5\n7y48CGBv7+e9AB7wbf/nvWiXawFc8KSZusIZen78GVYiNb4nK4V7j6rqv1HV0yJyD4C/xYoU829V\ndQkAhqlgiIj8PoD/CmAzgIdE5ISq7ub49KMsJrOKiBwC8DsA3iAiTwO4HcABAPeIyCcBPAVgT2/3\n72Al0uUxABcB/MvCDS4YrhQlhJCaQMmFEEJqAh06IYTUBDp0QgipCXTohBBSE+jQCSGkJtChE0JI\nTaBDJ4SQmkCHTgghNeH/A0QcOu3pSCSwAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1102cf9b0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Is ranking difference correlated with score difference?\n",
"plt.scatter(df_concat['RankDiff'], df_concat['ScoreDiff']);"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Create testing and training sets\n",
"df_wins = pd.DataFrame()\n",
"df_wins['RankDiff'] = df_total['RankDiff']\n",
"df_wins['Result'] = 1\n",
"\n",
"df_losses = pd.DataFrame()\n",
"df_losses['RankDiff'] = -df_total['RankDiff']\n",
"df_losses['Result'] = 0"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>RankDiff</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.062500</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-150.448529</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>14.294118</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>24.952206</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>-13.906250</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" RankDiff Result\n",
"0 1.062500 1\n",
"1 -150.448529 1\n",
"2 14.294118 1\n",
"3 24.952206 1\n",
"4 -13.906250 1"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_predictions = pd.concat((df_wins, df_losses))\n",
"df_predictions.head()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X_train = df_predictions['RankDiff'].values.reshape(-1,1)\n",
"Y_train = df_predictions['Result'].values\n",
"X_train, Y_train = shuffle(X_train, Y_train)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Best log_loss: -0.5468, with best C: 0.0016681005372000592\n"
]
}
],
"source": [
"# Fit and test model\n",
"logreg2 = LogisticRegression()\n",
"params = {'C': np.logspace(start=-5, stop=5, num=10)}\n",
"clf = GridSearchCV(logreg, params, scoring='neg_log_loss', refit=True)\n",
"clf.fit(X_train, Y_train)\n",
"print('Best log_loss: {:.4}, with best C: {}'.format(clf.best_score_, clf.best_params_['C']))"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Type</th>\n",
" <th>Log Loss</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Seed Based Logistic Regression</td>\n",
" <td>-0.553150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Avg. Ranking Based Logistic Regression</td>\n",
" <td>-0.546793</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Type Log Loss\n",
"0 Seed Based Logistic Regression -0.553150\n",
"0 Avg. Ranking Based Logistic Regression -0.546793"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Store model results\n",
"df_results = df_results.append(pd.DataFrame({'Type': ['Avg. Ranking Based Logistic Regression'], 'Log Loss': [clf.best_score_]}, columns=['Type', 'Log Loss']))\n",
"df_results.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### FiveThirtyEight Elo Logistic Regression Implementation"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Homecourt Bonus\n",
"HOME_ADVANTAGE = 100 \n",
"# Learning rate\n",
"K = 22"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>WScore</th>\n",
" <th>LTeamID</th>\n",
" <th>LScore</th>\n",
" <th>WLoc</th>\n",
" <th>NumOT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>20</td>\n",
" <td>1228</td>\n",
" <td>81</td>\n",
" <td>1328</td>\n",
" <td>64</td>\n",
" <td>N</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>25</td>\n",
" <td>1106</td>\n",
" <td>77</td>\n",
" <td>1354</td>\n",
" <td>70</td>\n",
" <td>H</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>25</td>\n",
" <td>1112</td>\n",
" <td>63</td>\n",
" <td>1223</td>\n",
" <td>56</td>\n",
" <td>H</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>25</td>\n",
" <td>1165</td>\n",
" <td>70</td>\n",
" <td>1432</td>\n",
" <td>54</td>\n",
" <td>H</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>25</td>\n",
" <td>1192</td>\n",
" <td>86</td>\n",
" <td>1447</td>\n",
" <td>74</td>\n",
" <td>H</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID WScore LTeamID LScore WLoc NumOT\n",
"0 1985 20 1228 81 1328 64 N 0\n",
"1 1985 25 1106 77 1354 70 H 0\n",
"2 1985 25 1112 63 1223 56 H 0\n",
"3 1985 25 1165 70 1432 54 H 0\n",
"4 1985 25 1192 86 1447 74 H 0"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load regular season data\n",
"data_dir = './March Madness 2018/DataFiles/'\n",
"rs = pd.read_csv(data_dir + 'RegularSeasonCompactResults.csv')\n",
"rs.head()"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"364"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Teams\n",
"team_ids = set(rs.WTeamID).union(set(rs.LTeamID))\n",
"len(team_ids)"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Score lookup dict\n",
"elo_dict = dict(zip(list(team_ids), [1500] * len(team_ids)))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# New columns to help us iteratively update elos\n",
"rs['margin'] = rs.WScore - rs.LScore\n",
"rs['w_elo'] = None\n",
"rs['l_elo'] = None"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Elo calculation\n",
"def elo_pred(elo1, elo2):\n",
" return(1. / (10. ** (-(elo1 - elo2) / 400.) + 1.))\n",
"\n",
"def expected_margin(elo_diff):\n",
" return((7.5 + 0.006 * elo_diff))\n",
"\n",
"def elo_update(w_elo, l_elo, margin):\n",
" elo_diff = w_elo - l_elo\n",
" pred = elo_pred(w_elo, l_elo)\n",
" mult = ((margin + 3.) ** 0.8) / expected_margin(elo_diff)\n",
" update = K * mult * (1 - pred)\n",
" return(pred, update)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Check order\n",
"assert np.all(rs.index.values == np.array(range(rs.shape[0]))), \"Index is out of order.\""
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Iterate through all games\n",
"preds = []\n",
"for i in range(rs.shape[0]):\n",
" \n",
" # Get key data from current row\n",
" w = rs.at[i, 'WTeamID']\n",
" l = rs.at[i, 'LTeamID']\n",
" margin = rs.at[i, 'margin']\n",
" wloc = rs.at[i, 'WLoc']\n",
" \n",
" # Does either team get a home-court advantage?\n",
" w_ad, l_ad, = 0., 0.\n",
" if wloc == \"H\":\n",
" w_ad += HOME_ADVANTAGE\n",
" elif wloc == \"A\":\n",
" l_ad += HOME_ADVANTAGE\n",
" \n",
" # Get elo updates as a result of the game\n",
" pred, update = elo_update(elo_dict[w] + w_ad,\n",
" elo_dict[l] + l_ad, \n",
" margin)\n",
" elo_dict[w] += update\n",
" elo_dict[l] -= update\n",
" preds.append(pred)\n",
" \n",
" # Stores new elos in the games dataframe\n",
" rs.loc[i, 'w_elo'] = elo_dict[w]\n",
" rs.loc[i, 'l_elo'] = elo_dict[l]"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def final_elo_per_season(df, team_id):\n",
" d = df.copy()\n",
" d = d.loc[(d.WTeamID == team_id) | (d.LTeamID == team_id), :]\n",
" d.sort_values(['Season', 'DayNum'], inplace=True)\n",
" d.drop_duplicates(['Season'], keep='last', inplace=True)\n",
" w_mask = d.WTeamID == team_id\n",
" l_mask = d.LTeamID == team_id\n",
" d['season_elo'] = None\n",
" d.loc[w_mask, 'season_elo'] = d.loc[w_mask, 'w_elo']\n",
" d.loc[l_mask, 'season_elo'] = d.loc[l_mask, 'l_elo']\n",
" out = pd.DataFrame({\n",
" 'team_id': team_id,\n",
" 'season': d.Season,\n",
" 'season_elo': d.season_elo\n",
" })\n",
" return(out)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"df_list = [final_elo_per_season(rs, i) for i in team_ids]\n",
"season_elos = pd.concat(df_list)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>season</th>\n",
" <th>season_elo</th>\n",
" <th>team_id</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>134286</th>\n",
" <td>2014</td>\n",
" <td>1317.05</td>\n",
" <td>1101</td>\n",
" </tr>\n",
" <tr>\n",
" <th>139681</th>\n",
" <td>2015</td>\n",
" <td>1201.11</td>\n",
" <td>1101</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145038</th>\n",
" <td>2016</td>\n",
" <td>1213.74</td>\n",
" <td>1101</td>\n",
" </tr>\n",
" <tr>\n",
" <th>150369</th>\n",
" <td>2017</td>\n",
" <td>1233.86</td>\n",
" <td>1101</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3606</th>\n",
" <td>1985</td>\n",
" <td>1404.46</td>\n",
" <td>1102</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" season season_elo team_id\n",
"134286 2014 1317.05 1101\n",
"139681 2015 1201.11 1101\n",
"145038 2016 1213.74 1101\n",
"150369 2017 1233.86 1101\n",
"3606 1985 1404.46 1102"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"season_elos.head()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>W_Elo</th>\n",
" <th>L_Elo</th>\n",
" <th>Elo_Diff</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>1116</td>\n",
" <td>1234</td>\n",
" <td>1591.58</td>\n",
" <td>1611.14</td>\n",
" <td>-19.5577</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>1120</td>\n",
" <td>1345</td>\n",
" <td>1571.38</td>\n",
" <td>1582.63</td>\n",
" <td>-11.2464</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>1207</td>\n",
" <td>1250</td>\n",
" <td>1748.49</td>\n",
" <td>1430.35</td>\n",
" <td>318.145</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>1229</td>\n",
" <td>1425</td>\n",
" <td>1582.04</td>\n",
" <td>1578.1</td>\n",
" <td>3.94023</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>1242</td>\n",
" <td>1325</td>\n",
" <td>1615.96</td>\n",
" <td>1600.98</td>\n",
" <td>14.9841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season WTeamID LTeamID W_Elo L_Elo Elo_Diff\n",
"0 1985 1116 1234 1591.58 1611.14 -19.5577\n",
"1 1985 1120 1345 1571.38 1582.63 -11.2464\n",
"2 1985 1207 1250 1748.49 1430.35 318.145\n",
"3 1985 1229 1425 1582.04 1578.1 3.94023\n",
"4 1985 1242 1325 1615.96 1600.98 14.9841"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Logistic Regression\n",
"data_dir = './March Madness 2018/DataFiles/'\n",
"df_tour = pd.read_csv(data_dir + 'NCAATourneyCompactResults.csv')\n",
"df_tour.drop(labels=['DayNum','WLoc', 'NumOT', 'WScore', 'LScore'], inplace=True, axis=1)\n",
"\n",
"df_win_elos = season_elos.rename(columns={'team_id':'WTeamID', 'season':'Season', 'season_elo':'W_Elo'}) #\n",
"df_loss_elos = season_elos.rename(columns={'team_id':'LTeamID', 'season':'Season', 'season_elo':'L_Elo'}) #\n",
"df_dummy = pd.merge(left=df_tour, right=df_win_elos, how='left', on=['Season', 'WTeamID'])\n",
"df_concat = pd.merge(left=df_dummy, right=df_loss_elos, on=['Season', 'LTeamID'])\n",
"df_concat['Elo_Diff'] = df_concat['W_Elo'] - df_concat['L_Elo']\n",
"df_concat.head()"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Prediction dataframe\n",
"df_wins = pd.DataFrame()\n",
"df_wins['Elo_Diff'] = df_concat['Elo_Diff']\n",
"df_wins['Result'] = 1\n",
"\n",
"df_losses = pd.DataFrame()\n",
"df_losses['Elo_Diff'] = -df_concat['Elo_Diff']\n",
"df_losses['Result'] = 0"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Elo_Diff</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>-19.5577</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>-11.2464</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>318.145</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>3.94023</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>14.9841</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Elo_Diff Result\n",
"0 -19.5577 1\n",
"1 -11.2464 1\n",
"2 318.145 1\n",
"3 3.94023 1\n",
"4 14.9841 1"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_predictions = pd.concat((df_wins, df_losses))\n",
"df_predictions.head()"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X_train = df_predictions['Elo_Diff'].values.reshape(-1,1)\n",
"Y_train = df_predictions['Result'].values\n",
"X_train, Y_train = shuffle(X_train, Y_train)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Best log_loss: -0.5428, with best C: 0.0001291549665014884\n"
]
}
],
"source": [
"# Fit and test model\n",
"logreg2 = LogisticRegression()\n",
"params = {'C': np.logspace(start=-5, stop=5, num=10)}\n",
"clf = GridSearchCV(logreg, params, scoring='neg_log_loss', refit=True)\n",
"clf.fit(X_train, Y_train)\n",
"print('Best log_loss: {:.4}, with best C: {}'.format(clf.best_score_, clf.best_params_['C']))"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Type</th>\n",
" <th>Log Loss</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Seed Based Logistic Regression</td>\n",
" <td>-0.553150</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Avg. Ranking Based Logistic Regression</td>\n",
" <td>-0.546793</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>FiveThirtyEight Elo Logistic Regression</td>\n",
" <td>-0.542821</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Type Log Loss\n",
"0 Seed Based Logistic Regression -0.553150\n",
"0 Avg. Ranking Based Logistic Regression -0.546793\n",
"0 FiveThirtyEight Elo Logistic Regression -0.542821"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Store model results\n",
"df_results = df_results.append(pd.DataFrame({'Type': ['FiveThirtyEight Elo Logistic Regression'], 'Log Loss': [clf.best_score_]}, columns=['Type', 'Log Loss']))\n",
"df_results.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Select Ranking Systems"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>1116</td>\n",
" <td>1234</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>1120</td>\n",
" <td>1345</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>1207</td>\n",
" <td>1250</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>1229</td>\n",
" <td>1425</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>1242</td>\n",
" <td>1325</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season WTeamID LTeamID\n",
"0 1985 1116 1234\n",
"1 1985 1120 1345\n",
"2 1985 1207 1250\n",
"3 1985 1229 1425\n",
"4 1985 1242 1325"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_dir = './March Madness 2018/DataFiles/'\n",
"df_tour = pd.read_csv(data_dir + 'NCAATourneyCompactResults.csv')\n",
"df_tour.drop(labels=['DayNum','WLoc', 'NumOT', 'WScore', 'LScore'], inplace=True, axis=1)\n",
"df_tour.head()"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['SEL', 'AP', 'BIH', 'DUN', 'ENT', 'GRN', 'IMS', 'MAS', 'MKV', 'MOR', 'POM', 'RPI', 'SAG', 'SAU', 'SE', 'STR', 'USA', 'WLK', 'WOB', 'BOB', 'DWH', 'ERD', 'ECK', 'BRZ', 'ARG', 'RTH', 'WOL', 'HOL', 'COL', 'DOL', 'GRS', 'HER', 'TSR', 'WTE', 'BD', 'MGY', 'CNG', 'SIM', 'DES', 'JON', 'LYN', 'NOR', 'RM', 'REI', 'ACU', 'BCM', 'CMV', 'SAP', 'DC', 'KLK', 'WIL', 'ROH', 'RIS', 'REN', 'SCR', 'DOK', 'PIG', 'KPK', 'PKL', 'TRX', 'MB', 'JCI', 'PH', 'LYD', 'KRA', 'RTR', 'UCS', 'ISR', 'CPR', 'BKM', 'JEN', 'REW', 'STH', 'SPW', 'RSE', 'PGH', 'CPA', 'RTB', 'HKB', 'BPI', 'TW', 'NOL', 'DC2', 'DCI', 'OMY', 'LMC', 'RT', 'KEL', 'KMV', 'RTP', 'TMR', 'AUS', 'ROG', 'PTS', 'KOS', 'PEQ', 'ADE', 'BNM', 'CJB', 'BUR', 'HAT', 'MSX', 'BBT', '7OT', 'SFX', 'EBP', 'TBD', 'CRO', 'D1A', 'TPR', 'BLS', 'DII', 'KBM', 'TRP', 'LOG', 'SP', 'STF', 'WMR', 'PPR', 'STS', 'UPS', 'SPR', 'MvG', 'TRK', 'BWE', 'HAS', 'FSH', 'DAV', 'KPI', 'FAS', 'MCL', 'HRN', 'RSL', 'SMN', 'DDB', 'INP', 'JRT', 'ESR', 'FMG', 'PRR', 'SMS', 'HKS', 'MUZ', 'OCT', 'SGR', 'ZAM', 'JNG', 'CRW', 'PMC', 'YAG']\n"
]
}
],
"source": [
"# Get list of all ranking systems\n",
"ranking_types = df_massey['SystemName'].unique().tolist()\n",
"ranking_types = [e for e in ranking_types if e not in ('MIC', 'GC', 'RAG', 'TOL', 'EBB', 'BP5', 'MPI', 'BOW', 'CTL')]\n",
"print(ranking_types)"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Iterate through each ranking and check log loss\n",
"def logreg_type(mytype):\n",
" df_type = df_massey.loc[(df_massey['RankingDayNum'] == final_day) & (df_massey['SystemName'] == mytype)]\n",
" df_type = df_type.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
" df_type.rename(columns={'OrdinalRank':'Type Rank'}, inplace=True)\n",
"\n",
" df_win_ranks = df_type.rename(columns={'TeamID':'WTeamID', 'Type Rank':'WTypeRank'})\n",
" df_loss_ranks = df_type.rename(columns={'TeamID':'LTeamID', 'Type Rank':'LTypeRank'})\n",
" df_dummy = pd.merge(left=df_tour, right=df_win_ranks, how='left', on=['Season', 'WTeamID'])\n",
" df_concat = pd.merge(left=df_dummy, right=df_loss_ranks, on=['Season', 'LTeamID'])\n",
" df_concat['RankDiff'] = df_concat['WTypeRank'] - df_concat['LTypeRank']\n",
" df_total = df_concat[['Season', 'WTeamID','LTeamID', 'WTypeRank', 'LTypeRank', 'RankDiff']]\n",
" \n",
" if len(df_total) > 980:\n",
" df_wins = pd.DataFrame()\n",
" df_wins['RankDiff'] = df_total['RankDiff']\n",
" df_wins['Result'] = 1\n",
" df_losses = pd.DataFrame()\n",
" df_losses['RankDiff'] = -df_total['RankDiff']\n",
" df_losses['Result'] = 0\n",
"\n",
" df_predictions = pd.concat((df_wins, df_losses))\n",
"\n",
" X_train = df_predictions['RankDiff'].values.reshape(-1,1)\n",
" Y_train = df_predictions['Result'].values\n",
" X_train, Y_train = shuffle(X_train, Y_train)\n",
" if np.isnan(np.sum(X_train)) == False:\n",
"\n",
" logregtype = LogisticRegression()\n",
" params = {'C': np.logspace(start=-5, stop=5, num=10)}\n",
" clf = GridSearchCV(logregtype, params, scoring='neg_log_loss', refit=True)\n",
" clf.fit(X_train, Y_train)\n",
"\n",
" print('{} - Best log_loss: {:.4}, with best C: {}'.format(mytype, clf.best_score_, clf.best_params_['C']))\n",
" return(pd.DataFrame({'Type': [mytype], 'Log Loss': [clf.best_score_]}, columns=['Type', 'Log Loss']))\n",
" return(pd.DataFrame({'Type': [mytype], 'Log Loss': [999]}, columns=['Type', 'Log Loss']))\n",
" return(pd.DataFrame({'Type': [mytype], 'Log Loss': [999]}, columns=['Type', 'Log Loss']))"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MOR - Best log_loss: -0.5515, with best C: 0.0001291549665014884\n",
"POM - Best log_loss: -0.5514, with best C: 0.0001291549665014884\n",
"RPI - Best log_loss: -0.5583, with best C: 0.0001291549665014884\n",
"SAG - Best log_loss: -0.5491, with best C: 0.0001291549665014884\n",
"WLK - Best log_loss: -0.5523, with best C: 0.0001291549665014884\n",
"RTH - Best log_loss: -0.5557, with best C: 0.0001291549665014884\n",
"WOL - Best log_loss: -0.5573, with best C: 0.0016681005372000592\n",
"COL - Best log_loss: -0.5591, with best C: 0.0001291549665014884\n",
"DOL - Best log_loss: -0.5571, with best C: 0.0001291549665014884\n"
]
}
],
"source": [
"df_type_scores = pd.DataFrame(columns=['Type', 'Log Loss'])\n",
"for mytype in ranking_types:\n",
" df_type_scores = df_type_scores.append(logreg_type(mytype))"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Type</th>\n",
" <th>Log Loss</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>SAG</td>\n",
" <td>-0.549115</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>POM</td>\n",
" <td>-0.551438</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>MOR</td>\n",
" <td>-0.551542</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>WLK</td>\n",
" <td>-0.552273</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>RTH</td>\n",
" <td>-0.555652</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>DOL</td>\n",
" <td>-0.557051</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>WOL</td>\n",
" <td>-0.557305</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>RPI</td>\n",
" <td>-0.558276</td>\n",
" </tr>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>COL</td>\n",
" <td>-0.559128</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Type Log Loss\n",
"0 SAG -0.549115\n",
"0 POM -0.551438\n",
"0 MOR -0.551542\n",
"0 WLK -0.552273\n",
"0 RTH -0.555652\n",
"0 DOL -0.557051\n",
"0 WOL -0.557305\n",
"0 RPI -0.558276\n",
"0 COL -0.559128"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_type_scores = df_type_scores.loc[df_type_scores['Log Loss'] != 999]\n",
"df_type_scores.sort_values(by='Log Loss', ascending=False, inplace=True)\n",
"df_type_scores"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Final Model Selection\n",
"Right now, I have a couple different metric options to test, tune, and consider for use in the upcoming tournament:\n",
"1. FiveThirtyEight Elo Ratings\n",
"2. Average Select Ranking Systems\n",
"3. Composite Model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 1. FiveThirtyEight Elo Ratings Model"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>W_Elo</th>\n",
" <th>L_Elo</th>\n",
" <th>Elo_Diff</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1116</td>\n",
" <td>1234</td>\n",
" <td>1591.58</td>\n",
" <td>1611.14</td>\n",
" <td>-19.5577</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1120</td>\n",
" <td>1345</td>\n",
" <td>1571.38</td>\n",
" <td>1582.63</td>\n",
" <td>-11.2464</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1207</td>\n",
" <td>1250</td>\n",
" <td>1748.49</td>\n",
" <td>1430.35</td>\n",
" <td>318.145</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1229</td>\n",
" <td>1425</td>\n",
" <td>1582.04</td>\n",
" <td>1578.1</td>\n",
" <td>3.94023</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1242</td>\n",
" <td>1325</td>\n",
" <td>1615.96</td>\n",
" <td>1600.98</td>\n",
" <td>14.9841</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID W_Elo L_Elo Elo_Diff\n",
"0 1985 136 1116 1234 1591.58 1611.14 -19.5577\n",
"1 1985 136 1120 1345 1571.38 1582.63 -11.2464\n",
"2 1985 136 1207 1250 1748.49 1430.35 318.145\n",
"3 1985 136 1229 1425 1582.04 1578.1 3.94023\n",
"4 1985 136 1242 1325 1615.96 1600.98 14.9841"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# How does Elo perform alone?\n",
"data_dir = './March Madness 2018/DataFiles/'\n",
"df_tour = pd.read_csv(data_dir + 'NCAATourneyCompactResults.csv')\n",
"df_tour.drop(labels=['WLoc', 'NumOT', 'WScore', 'LScore'], inplace=True, axis=1)\n",
"\n",
"df_win_elos = season_elos.rename(columns={'team_id':'WTeamID', 'season_elo':'W_Elo', 'season':'Season'})\n",
"df_loss_elos = season_elos.rename(columns={'team_id':'LTeamID', 'season_elo':'L_Elo', 'season':'Season'}) \n",
"df_dummy = pd.merge(left=df_tour, right=df_win_elos, how='left', on=['Season', 'WTeamID'])\n",
"df_concat = pd.merge(left=df_dummy, right=df_loss_elos, on=['Season', 'LTeamID'])\n",
"df_concat['Elo_Diff'] = df_concat['W_Elo'] - df_concat['L_Elo']\n",
"df_concat.head()"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>Elo_Diff</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1116</td>\n",
" <td>1234</td>\n",
" <td>-19.5577</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1120</td>\n",
" <td>1345</td>\n",
" <td>-11.2464</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1207</td>\n",
" <td>1250</td>\n",
" <td>318.145</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1229</td>\n",
" <td>1425</td>\n",
" <td>3.94023</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>136</td>\n",
" <td>1242</td>\n",
" <td>1325</td>\n",
" <td>14.9841</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID Elo_Diff Result\n",
"0 1985 136 1116 1234 -19.5577 1\n",
"1 1985 136 1120 1345 -11.2464 1\n",
"2 1985 136 1207 1250 318.145 1\n",
"3 1985 136 1229 1425 3.94023 1\n",
"4 1985 136 1242 1325 14.9841 1"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Prediction dataframe\n",
"df_wins = pd.DataFrame()\n",
"df_wins['Season'] = df_concat['Season']\n",
"df_wins['DayNum'] = df_concat['DayNum']\n",
"df_wins['WTeamID'] = df_concat['WTeamID']\n",
"df_wins['LTeamID'] = df_concat['LTeamID']\n",
"\n",
"df_wins['Elo_Diff'] = df_concat['Elo_Diff']\n",
"df_wins['Result'] = 1\n",
"\n",
"df_losses = pd.DataFrame()\n",
"df_losses['Season'] = df_concat['Season']\n",
"df_losses['DayNum'] = df_concat['DayNum']\n",
"df_losses['WTeamID'] = df_concat['WTeamID']\n",
"df_losses['LTeamID'] = df_concat['LTeamID']\n",
"\n",
"df_losses['Elo_Diff'] = -df_concat['Elo_Diff']\n",
"df_losses['Result'] = 0\n",
"\n",
"df_predictions = pd.concat((df_wins, df_losses))\n",
"df_predictions.head()"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4158"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Remove play-in games\n",
"df_predictions = df_predictions.loc[df_predictions['DayNum'] > 135]\n",
"len(df_predictions)"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.54538007699370672"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Testing and training sets\n",
"df_train = df_predictions.loc[df_predictions['Season'] < 2014]\n",
"df_test = df_predictions.loc[df_predictions['Season'] >= 2014]\n",
"\n",
"X_train = df_train['Elo_Diff'].values.reshape(-1,1)\n",
"Y_train = df_train['Result'].values\n",
"\n",
"X_test = df_test['Elo_Diff'].values.reshape(-1,1)\n",
"Y_test = df_test['Result'].values\n",
"\n",
"logreg = LogisticRegression()\n",
"params = {'C': np.logspace(start=-5, stop=5, num=10)}\n",
"clf = GridSearchCV(logreg, params, scoring='neg_log_loss', refit=True)\n",
"clf.fit(X_train, Y_train)\n",
"clf.score(X_train, Y_train)"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>Elo_Diff</th>\n",
" <th>Result</th>\n",
" <th>Elo_Pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2112</th>\n",
" <td>2017</td>\n",
" <td>146</td>\n",
" <td>1314</td>\n",
" <td>1246</td>\n",
" <td>17.9249</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2113</th>\n",
" <td>2017</td>\n",
" <td>146</td>\n",
" <td>1376</td>\n",
" <td>1196</td>\n",
" <td>144.711</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2114</th>\n",
" <td>2017</td>\n",
" <td>152</td>\n",
" <td>1211</td>\n",
" <td>1376</td>\n",
" <td>-242.598</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2115</th>\n",
" <td>2017</td>\n",
" <td>152</td>\n",
" <td>1314</td>\n",
" <td>1332</td>\n",
" <td>-45.0282</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2116</th>\n",
" <td>2017</td>\n",
" <td>154</td>\n",
" <td>1314</td>\n",
" <td>1211</td>\n",
" <td>-10.9314</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID Elo_Diff Result Elo_Pred\n",
"2112 2017 146 1314 1246 17.9249 0 1\n",
"2113 2017 146 1376 1196 144.711 0 1\n",
"2114 2017 152 1211 1376 -242.598 0 0\n",
"2115 2017 152 1314 1332 -45.0282 0 0\n",
"2116 2017 154 1314 1211 -10.9314 0 0"
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 2014-2017 results\n",
"Y_pred = clf.predict(X_test)\n",
"df_test['Elo_Pred'] = Y_pred\n",
"df_test.tail()"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Confusion Matrix: \n",
"[[185 67]\n",
" [ 67 185]] \n",
"\n",
" precision recall f1-score support\n",
"\n",
" 0 0.73 0.73 0.73 252\n",
" 1 0.73 0.73 0.73 252\n",
"\n",
"avg / total 0.73 0.73 0.73 504\n",
"\n"
]
}
],
"source": [
"# More results\n",
"print('Confusion Matrix: ')\n",
"print(confusion_matrix(Y_test, Y_pred), '\\n')\n",
"print(classification_report(Y_test, Y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2. Average Select Ranking Systems\n",
"We will pull and average the top performing rankings from our analysis before:\n",
"1. SAG \n",
"2. WLK\n",
"3. POM\n",
"4. MOR"
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Pull various system rankings\n",
"df_topranks = season_elos.loc[:, season_elos.columns != 'Elo']\n",
"df_topranks = df_topranks.rename(columns={'team_id':'Team_ID', 'season':'Season'}) \n",
"\n",
"df_temp = df_massey.loc[(df_massey['RankingDayNum'] == final_day) & (df_massey['SystemName'] == 'SAG')]\n",
"df_temp = df_temp.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
"df_temp.rename(columns={'OrdinalRank':'SAG', 'TeamID':'Team_ID'}, inplace=True)\n",
"\n",
"df_temp2 = df_massey.loc[(df_massey['RankingDayNum'] == final_day) & (df_massey['SystemName'] == 'WLK')]\n",
"df_temp2 = df_temp2.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
"df_temp2.rename(columns={'OrdinalRank':'WLK', 'TeamID':'Team_ID'}, inplace=True)\n",
"\n",
"df_temp3 = df_massey.loc[(df_massey['RankingDayNum'] == final_day) & (df_massey['SystemName'] == 'POM')]\n",
"df_temp3 = df_temp3.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
"df_temp3.rename(columns={'OrdinalRank':'POM', 'TeamID':'Team_ID'}, inplace=True)\n",
"\n",
"df_temp4 = df_massey.loc[(df_massey['RankingDayNum'] == final_day) & (df_massey['SystemName'] == 'MOR')]\n",
"df_temp4 = df_temp4.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
"df_temp4.rename(columns={'OrdinalRank':'MOR', 'TeamID':'Team_ID'}, inplace=True)\n",
"\n",
"df_topranks = pd.merge(left=df_topranks, right=df_temp, how='left', on=['Season', 'Team_ID'])\n",
"df_topranks = pd.merge(left=df_topranks, right=df_temp2, how='left', on=['Season', 'Team_ID'])\n",
"df_topranks = pd.merge(left=df_topranks, right=df_temp3, how='left', on=['Season', 'Team_ID'])\n",
"df_topranks = pd.merge(left=df_topranks, right=df_temp4, how='left', on=['Season', 'Team_ID'])"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>season_elo</th>\n",
" <th>Team_ID</th>\n",
" <th>SAG</th>\n",
" <th>WLK</th>\n",
" <th>POM</th>\n",
" <th>MOR</th>\n",
" <th>MeanRank</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014</td>\n",
" <td>1317.05</td>\n",
" <td>1101</td>\n",
" <td>346.0</td>\n",
" <td>330.0</td>\n",
" <td>348.0</td>\n",
" <td>349.0</td>\n",
" <td>343.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015</td>\n",
" <td>1201.11</td>\n",
" <td>1101</td>\n",
" <td>336.0</td>\n",
" <td>332.0</td>\n",
" <td>332.0</td>\n",
" <td>346.0</td>\n",
" <td>336.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2016</td>\n",
" <td>1213.74</td>\n",
" <td>1101</td>\n",
" <td>320.0</td>\n",
" <td>304.0</td>\n",
" <td>318.0</td>\n",
" <td>311.0</td>\n",
" <td>313.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2017</td>\n",
" <td>1233.86</td>\n",
" <td>1101</td>\n",
" <td>305.0</td>\n",
" <td>307.0</td>\n",
" <td>300.0</td>\n",
" <td>317.0</td>\n",
" <td>307.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>2003</td>\n",
" <td>1452.53</td>\n",
" <td>1102</td>\n",
" <td>149.0</td>\n",
" <td>165.0</td>\n",
" <td>160.0</td>\n",
" <td>132.0</td>\n",
" <td>151.50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season season_elo Team_ID SAG WLK POM MOR MeanRank\n",
"0 2014 1317.05 1101 346.0 330.0 348.0 349.0 343.25\n",
"1 2015 1201.11 1101 336.0 332.0 332.0 346.0 336.50\n",
"2 2016 1213.74 1101 320.0 304.0 318.0 311.0 313.25\n",
"3 2017 1233.86 1101 305.0 307.0 300.0 317.0 307.25\n",
"22 2003 1452.53 1102 149.0 165.0 160.0 132.0 151.50"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Mean of all four systems\n",
"df_topranks['MeanRank'] = (df_topranks['SAG'] + df_topranks['WLK'] + df_topranks['POM'] + df_topranks['MOR']) / 4\n",
"df_topranks.dropna(inplace = True)\n",
"df_topranks.head()"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>season_elo_x</th>\n",
" <th>W_MeanRank</th>\n",
" <th>season_elo_y</th>\n",
" <th>L_MeanRank</th>\n",
" <th>MeanRank_Diff</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>134</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>1318.06</td>\n",
" <td>259.50</td>\n",
" <td>1288.79</td>\n",
" <td>264.50</td>\n",
" <td>-5.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1112</td>\n",
" <td>1436</td>\n",
" <td>2051.08</td>\n",
" <td>2.75</td>\n",
" <td>1442.8</td>\n",
" <td>160.50</td>\n",
" <td>-157.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1113</td>\n",
" <td>1272</td>\n",
" <td>1787.95</td>\n",
" <td>30.00</td>\n",
" <td>1833.37</td>\n",
" <td>22.00</td>\n",
" <td>8.00</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1141</td>\n",
" <td>1166</td>\n",
" <td>1663.71</td>\n",
" <td>45.00</td>\n",
" <td>1835.58</td>\n",
" <td>24.25</td>\n",
" <td>20.75</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1143</td>\n",
" <td>1301</td>\n",
" <td>1862.13</td>\n",
" <td>39.00</td>\n",
" <td>1825.56</td>\n",
" <td>44.00</td>\n",
" <td>-5.00</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID season_elo_x W_MeanRank season_elo_y \\\n",
"0 2003 134 1421 1411 1318.06 259.50 1288.79 \n",
"1 2003 136 1112 1436 2051.08 2.75 1442.8 \n",
"2 2003 136 1113 1272 1787.95 30.00 1833.37 \n",
"3 2003 136 1141 1166 1663.71 45.00 1835.58 \n",
"4 2003 136 1143 1301 1862.13 39.00 1825.56 \n",
"\n",
" L_MeanRank MeanRank_Diff \n",
"0 264.50 -5.00 \n",
"1 160.50 -157.75 \n",
"2 22.00 8.00 \n",
"3 24.25 20.75 \n",
"4 44.00 -5.00 "
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Join with tournament dataframe\n",
"data_dir = './March Madness 2018/DataFiles/'\n",
"df_tour = pd.read_csv(data_dir + 'NCAATourneyCompactResults.csv')\n",
"df_tour.drop(labels=['WLoc', 'NumOT', 'WScore', 'LScore'], inplace=True, axis=1)\n",
"df_topranks.drop(labels=['SAG', 'WLK', 'POM', 'MOR'], inplace=True, axis=1)\n",
"\n",
"df_win_elos = df_topranks.rename(columns={'Team_ID':'WTeamID', 'MeanRank':'W_MeanRank'})\n",
"df_loss_elos = df_topranks.rename(columns={'Team_ID':'LTeamID', 'MeanRank':'L_MeanRank'}) \n",
"df_dummy = pd.merge(left=df_tour, right=df_win_elos, how='left', on=['Season', 'WTeamID'])\n",
"df_concat = pd.merge(left=df_dummy, right=df_loss_elos, on=['Season', 'LTeamID'])\n",
"df_concat['MeanRank_Diff'] = df_concat['W_MeanRank'] - df_concat['L_MeanRank']\n",
"df_concat.head()"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>MeanRank_Diff</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>134</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>-5.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1112</td>\n",
" <td>1436</td>\n",
" <td>-157.75</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1113</td>\n",
" <td>1272</td>\n",
" <td>8.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1141</td>\n",
" <td>1166</td>\n",
" <td>20.75</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1143</td>\n",
" <td>1301</td>\n",
" <td>-5.00</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID MeanRank_Diff Result\n",
"0 2003 134 1421 1411 -5.00 1\n",
"1 2003 136 1112 1436 -157.75 1\n",
"2 2003 136 1113 1272 8.00 1\n",
"3 2003 136 1141 1166 20.75 1\n",
"4 2003 136 1143 1301 -5.00 1"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Prediction dataframe\n",
"df_wins = pd.DataFrame()\n",
"df_wins['Season'] = df_concat['Season']\n",
"df_wins['DayNum'] = df_concat['DayNum']\n",
"df_wins['WTeamID'] = df_concat['WTeamID']\n",
"df_wins['LTeamID'] = df_concat['LTeamID']\n",
"\n",
"df_wins['MeanRank_Diff'] = df_concat['MeanRank_Diff']\n",
"df_wins['Result'] = 1\n",
"\n",
"df_losses = pd.DataFrame()\n",
"df_losses['Season'] = df_concat['Season']\n",
"df_losses['DayNum'] = df_concat['DayNum']\n",
"df_losses['WTeamID'] = df_concat['WTeamID']\n",
"df_losses['LTeamID'] = df_concat['LTeamID']\n",
"\n",
"df_losses['MeanRank_Diff'] = -df_concat['MeanRank_Diff']\n",
"df_losses['Result'] = 0\n",
"\n",
"df_predictions = pd.concat((df_wins, df_losses))\n",
"df_predictions.head()"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1890"
]
},
"execution_count": 59,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Remove play-in games\n",
"df_predictions = df_predictions.loc[df_predictions['DayNum'] > 135]\n",
"len(df_predictions)"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.5450159995753735"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Testing and training sets\n",
"df_train = df_predictions.loc[df_predictions['Season'] < 2014]\n",
"df_test = df_predictions.loc[df_predictions['Season'] >= 2014]\n",
"\n",
"X_train = df_train['MeanRank_Diff'].values.reshape(-1,1)\n",
"Y_train = df_train['Result'].values\n",
"\n",
"X_test = df_test['MeanRank_Diff'].values.reshape(-1,1)\n",
"Y_test = df_test['Result'].values\n",
"\n",
"logreg = LogisticRegression()\n",
"params = {'C': np.logspace(start=-5, stop=5, num=10)}\n",
"clf2 = GridSearchCV(logreg, params, scoring='neg_log_loss', refit=True)\n",
"clf2.fit(X_train, Y_train)\n",
"clf2.score(X_train, Y_train)"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Confusion Matrix: \n",
"[[185 67]\n",
" [ 67 185]] \n",
"\n",
" precision recall f1-score support\n",
"\n",
" 0 0.73 0.73 0.73 252\n",
" 1 0.73 0.73 0.73 252\n",
"\n",
"avg / total 0.73 0.73 0.73 504\n",
"\n"
]
}
],
"source": [
"# More results\n",
"print('Confusion Matrix: ')\n",
"print(confusion_matrix(Y_test, Y_pred), '\\n')\n",
"print(classification_report(Y_test, Y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 3. Composite Model\n",
"Standardize the elo ratings and rankings and take the mean for logistic regression."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>Elo</th>\n",
" <th>Team_ID</th>\n",
" <th>season_elo</th>\n",
" <th>MeanRank</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014</td>\n",
" <td>1317.05</td>\n",
" <td>1101</td>\n",
" <td>1317.05</td>\n",
" <td>343.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015</td>\n",
" <td>1201.11</td>\n",
" <td>1101</td>\n",
" <td>1201.11</td>\n",
" <td>336.50</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2016</td>\n",
" <td>1213.74</td>\n",
" <td>1101</td>\n",
" <td>1213.74</td>\n",
" <td>313.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2017</td>\n",
" <td>1233.86</td>\n",
" <td>1101</td>\n",
" <td>1233.86</td>\n",
" <td>307.25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>2003</td>\n",
" <td>1452.53</td>\n",
" <td>1102</td>\n",
" <td>1452.53</td>\n",
" <td>151.50</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season Elo Team_ID season_elo MeanRank\n",
"0 2014 1317.05 1101 1317.05 343.25\n",
"1 2015 1201.11 1101 1201.11 336.50\n",
"2 2016 1213.74 1101 1213.74 313.25\n",
"3 2017 1233.86 1101 1233.86 307.25\n",
"22 2003 1452.53 1102 1452.53 151.50"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Set up and drop null rows\n",
"season_elos = season_elos.rename(columns={'team_id':'Team_ID', 'season':'Season', 'season_elo':'Elo'}) \n",
"df = pd.merge(left=season_elos, right=df_topranks, how='left', on=['Season', 'Team_ID'])\n",
"df.dropna(inplace=True)\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Normalize features\n",
"scaler = preprocessing.MinMaxScaler(feature_range=(0,1))\n",
"df['Elo_Scaled'] = scaler.fit_transform(df['Elo'].values.reshape(-1,1))\n",
"df['MeanRank_Scaled'] = 1 - scaler.fit_transform(df['MeanRank'].values.reshape(-1,1))"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>Elo</th>\n",
" <th>Team_ID</th>\n",
" <th>season_elo</th>\n",
" <th>MeanRank</th>\n",
" <th>Elo_Scaled</th>\n",
" <th>MeanRank_Scaled</th>\n",
" <th>Composite Score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014</td>\n",
" <td>1317.05</td>\n",
" <td>1101</td>\n",
" <td>1317.05</td>\n",
" <td>343.25</td>\n",
" <td>0.377452</td>\n",
" <td>0.022143</td>\n",
" <td>0.199798</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2015</td>\n",
" <td>1201.11</td>\n",
" <td>1101</td>\n",
" <td>1201.11</td>\n",
" <td>336.50</td>\n",
" <td>0.289849</td>\n",
" <td>0.041429</td>\n",
" <td>0.165639</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2016</td>\n",
" <td>1213.74</td>\n",
" <td>1101</td>\n",
" <td>1213.74</td>\n",
" <td>313.25</td>\n",
" <td>0.299388</td>\n",
" <td>0.107857</td>\n",
" <td>0.203622</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2017</td>\n",
" <td>1233.86</td>\n",
" <td>1101</td>\n",
" <td>1233.86</td>\n",
" <td>307.25</td>\n",
" <td>0.314596</td>\n",
" <td>0.125000</td>\n",
" <td>0.219798</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>2003</td>\n",
" <td>1452.53</td>\n",
" <td>1102</td>\n",
" <td>1452.53</td>\n",
" <td>151.50</td>\n",
" <td>0.479827</td>\n",
" <td>0.570000</td>\n",
" <td>0.524914</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season Elo Team_ID season_elo MeanRank Elo_Scaled \\\n",
"0 2014 1317.05 1101 1317.05 343.25 0.377452 \n",
"1 2015 1201.11 1101 1201.11 336.50 0.289849 \n",
"2 2016 1213.74 1101 1213.74 313.25 0.299388 \n",
"3 2017 1233.86 1101 1233.86 307.25 0.314596 \n",
"22 2003 1452.53 1102 1452.53 151.50 0.479827 \n",
"\n",
" MeanRank_Scaled Composite Score \n",
"0 0.022143 0.199798 \n",
"1 0.041429 0.165639 \n",
"2 0.107857 0.203622 \n",
"3 0.125000 0.219798 \n",
"22 0.570000 0.524914 "
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Model 1\n",
"df['Composite Score'] = (df['Elo_Scaled'] + (2 * df['MeanRank_Scaled'])) / 3\n",
"df.head()\n",
"\n",
"# Model 2\n",
"#df['Composite Score'] = (df['Elo_Scaled'] + (df['MeanRank_Scaled'])) / 2\n",
"#df.head()"
]
},
{
"cell_type": "code",
"execution_count": 65,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>W_Composite</th>\n",
" <th>L_Composite</th>\n",
" <th>Composite_Diff</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>134</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>0.319824</td>\n",
" <td>0.301622</td>\n",
" <td>0.018201</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1112</td>\n",
" <td>1436</td>\n",
" <td>0.963552</td>\n",
" <td>0.508381</td>\n",
" <td>0.455171</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1113</td>\n",
" <td>1272</td>\n",
" <td>0.825212</td>\n",
" <td>0.853798</td>\n",
" <td>-0.028586</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1141</td>\n",
" <td>1166</td>\n",
" <td>0.756841</td>\n",
" <td>0.851419</td>\n",
" <td>-0.094578</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1143</td>\n",
" <td>1301</td>\n",
" <td>0.840379</td>\n",
" <td>0.819419</td>\n",
" <td>0.020960</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID W_Composite L_Composite Composite_Diff\n",
"0 2003 134 1421 1411 0.319824 0.301622 0.018201\n",
"1 2003 136 1112 1436 0.963552 0.508381 0.455171\n",
"2 2003 136 1113 1272 0.825212 0.853798 -0.028586\n",
"3 2003 136 1141 1166 0.756841 0.851419 -0.094578\n",
"4 2003 136 1143 1301 0.840379 0.819419 0.020960"
]
},
"execution_count": 65,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Join tournament dataframe\n",
"data_dir = './March Madness 2018/DataFiles/'\n",
"df_tour = pd.read_csv(data_dir + 'NCAATourneyCompactResults.csv')\n",
"df_tour.drop(labels=['WLoc', 'NumOT', 'WScore', 'LScore'], inplace=True, axis=1)\n",
"df.drop(labels=['Elo', 'season_elo', 'MeanRank'], inplace=True, axis=1)\n",
"\n",
"df_win_elos = df.rename(columns={'Team_ID':'WTeamID', 'Composite Score':'W_Composite'})\n",
"df_loss_elos = df.rename(columns={'Team_ID':'LTeamID', 'Composite Score':'L_Composite'}) \n",
"df_dummy = pd.merge(left=df_tour, right=df_win_elos, how='left', on=['Season', 'WTeamID'])\n",
"df_concat = pd.merge(left=df_dummy, right=df_loss_elos, on=['Season', 'LTeamID'])\n",
"df_concat['Composite_Diff'] = df_concat['W_Composite'] - df_concat['L_Composite']\n",
"df_total = df_concat[['Season', 'DayNum', 'WTeamID', 'LTeamID', 'W_Composite', 'L_Composite', 'Composite_Diff']]\n",
"df_total.head()"
]
},
{
"cell_type": "code",
"execution_count": 66,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>Composite_Diff</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>134</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>0.018201</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1112</td>\n",
" <td>1436</td>\n",
" <td>0.455171</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1113</td>\n",
" <td>1272</td>\n",
" <td>-0.028586</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1141</td>\n",
" <td>1166</td>\n",
" <td>-0.094578</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1143</td>\n",
" <td>1301</td>\n",
" <td>0.020960</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID Composite_Diff Result\n",
"0 2003 134 1421 1411 0.018201 1\n",
"1 2003 136 1112 1436 0.455171 1\n",
"2 2003 136 1113 1272 -0.028586 1\n",
"3 2003 136 1141 1166 -0.094578 1\n",
"4 2003 136 1143 1301 0.020960 1"
]
},
"execution_count": 66,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Prediction dataframe\n",
"df_wins = pd.DataFrame()\n",
"df_wins['Season'] = df_concat['Season']\n",
"df_wins['DayNum'] = df_concat['DayNum']\n",
"df_wins['WTeamID'] = df_concat['WTeamID']\n",
"df_wins['LTeamID'] = df_concat['LTeamID']\n",
"\n",
"df_wins['Composite_Diff'] = df_concat['Composite_Diff']\n",
"df_wins['Result'] = 1\n",
"\n",
"df_losses = pd.DataFrame()\n",
"df_losses['Season'] = df_concat['Season']\n",
"df_losses['DayNum'] = df_concat['DayNum']\n",
"df_losses['WTeamID'] = df_concat['WTeamID']\n",
"df_losses['LTeamID'] = df_concat['LTeamID']\n",
"\n",
"df_losses['Composite_Diff'] = -df_concat['Composite_Diff']\n",
"df_losses['Result'] = 0\n",
"\n",
"df_predictions = pd.concat((df_wins, df_losses))\n",
"df_predictions.head()"
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1890"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Remove play-in games\n",
"df_predictions = df_predictions.loc[df_predictions['DayNum'] > 135]\n",
"len(df_predictions)"
]
},
{
"cell_type": "code",
"execution_count": 68,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"GridSearchCV(cv=None, error_score='raise',\n",
" estimator=LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,\n",
" intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,\n",
" penalty='l2', random_state=None, solver='liblinear', tol=0.0001,\n",
" verbose=0, warm_start=False),\n",
" fit_params=None, iid=True, n_jobs=1,\n",
" param_grid={'C': array([ 1.00000e-05, 1.29155e-04, 1.66810e-03, 2.15443e-02,\n",
" 2.78256e-01, 3.59381e+00, 4.64159e+01, 5.99484e+02,\n",
" 7.74264e+03, 1.00000e+05])},\n",
" pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',\n",
" scoring='neg_log_loss', verbose=0)"
]
},
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Testing and training sets\n",
"df_train = df_predictions.loc[df_predictions['Season'] < 2014]\n",
"df_test = df_predictions.loc[df_predictions['Season'] >= 2014]\n",
"\n",
"X_train = df_train['Composite_Diff'].values.reshape(-1,1)\n",
"Y_train = df_train['Result'].values\n",
"\n",
"X_test = df_test['Composite_Diff'].values.reshape(-1,1)\n",
"Y_test = df_test['Result'].values\n",
"\n",
"logreg = LogisticRegression()\n",
"params = {'C': np.logspace(start=-5, stop=5, num=10)}\n",
"clf3 = GridSearchCV(logreg, params, scoring='neg_log_loss', refit=True)\n",
"clf3"
]
},
{
"cell_type": "code",
"execution_count": 69,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.5427760668455921"
]
},
"execution_count": 69,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Training score\n",
"clf3.fit(X_train, Y_train)\n",
"clf3.score(X_train, Y_train)"
]
},
{
"cell_type": "code",
"execution_count": 70,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Save model\n",
"filename = 'ncaa_tourney1.pkl'\n",
"#filename = 'ncaa_tourney2.pkl'\n",
"pickle.dump(clf3, open(filename, 'wb'))"
]
},
{
"cell_type": "code",
"execution_count": 71,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Confusion Matrix: \n",
"[[185 67]\n",
" [ 67 185]] \n",
"\n",
" precision recall f1-score support\n",
"\n",
" 0 0.73 0.73 0.73 252\n",
" 1 0.73 0.73 0.73 252\n",
"\n",
"avg / total 0.73 0.73 0.73 504\n",
"\n"
]
}
],
"source": [
"# More results\n",
"print('Confusion Matrix: ')\n",
"print(confusion_matrix(Y_test, Y_pred), '\\n')\n",
"print(classification_report(Y_test, Y_pred))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Model Performance"
]
},
{
"cell_type": "code",
"execution_count": 72,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.51075848153406123"
]
},
"execution_count": 72,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 2014-2017 log loss\n",
"Y_pred = clf3.predict(X_test)\n",
"df_test['Pred'] = Y_pred\n",
"clf3.score(X_test, Y_test)"
]
},
{
"cell_type": "code",
"execution_count": 73,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>Composite_Diff</th>\n",
" <th>Result</th>\n",
" <th>Pred</th>\n",
" <th>Prob</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>717</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>1163</td>\n",
" <td>1386</td>\n",
" <td>0.091565</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0.683972</td>\n",
" </tr>\n",
" <tr>\n",
" <th>718</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>1173</td>\n",
" <td>1326</td>\n",
" <td>-0.137862</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.761780</td>\n",
" </tr>\n",
" <tr>\n",
" <th>719</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>1196</td>\n",
" <td>1107</td>\n",
" <td>0.483541</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0.983329</td>\n",
" </tr>\n",
" <tr>\n",
" <th>720</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>1217</td>\n",
" <td>1153</td>\n",
" <td>-0.080948</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0.664310</td>\n",
" </tr>\n",
" <tr>\n",
" <th>721</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>1257</td>\n",
" <td>1264</td>\n",
" <td>0.271877</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0.908253</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID Composite_Diff Result Pred Prob\n",
"717 2014 136 1163 1386 0.091565 1 1 0.683972\n",
"718 2014 136 1173 1326 -0.137862 1 0 0.761780\n",
"719 2014 136 1196 1107 0.483541 1 1 0.983329\n",
"720 2014 136 1217 1153 -0.080948 1 0 0.664310\n",
"721 2014 136 1257 1264 0.271877 1 1 0.908253"
]
},
"execution_count": 73,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Probabilities\n",
"probs = clf3.predict_proba(X_test)\n",
"Y_prob = [max(item[0],item[1]) for item in probs]\n",
"df_test['Prob'] = Y_prob\n",
"\n",
"df_test.head()"
]
},
{
"cell_type": "code",
"execution_count": 74,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamName</th>\n",
" <th>LTeamName</th>\n",
" <th>Composite_Diff</th>\n",
" <th>Prob</th>\n",
" <th>Pred</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>252</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>Connecticut</td>\n",
" <td>St Joseph's PA</td>\n",
" <td>-0.091565</td>\n",
" <td>0.683972</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>253</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>Dayton</td>\n",
" <td>Ohio St</td>\n",
" <td>0.137862</td>\n",
" <td>0.761780</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>254</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>Florida</td>\n",
" <td>Albany NY</td>\n",
" <td>-0.483541</td>\n",
" <td>0.983329</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>255</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>Harvard</td>\n",
" <td>Cincinnati</td>\n",
" <td>0.080948</td>\n",
" <td>0.664310</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>256</th>\n",
" <td>2014</td>\n",
" <td>136</td>\n",
" <td>Louisville</td>\n",
" <td>Manhattan</td>\n",
" <td>-0.271877</td>\n",
" <td>0.908253</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamName LTeamName Composite_Diff Prob \\\n",
"252 2014 136 Connecticut St Joseph's PA -0.091565 0.683972 \n",
"253 2014 136 Dayton Ohio St 0.137862 0.761780 \n",
"254 2014 136 Florida Albany NY -0.483541 0.983329 \n",
"255 2014 136 Harvard Cincinnati 0.080948 0.664310 \n",
"256 2014 136 Louisville Manhattan -0.271877 0.908253 \n",
"\n",
" Pred Result \n",
"252 0 0 \n",
"253 1 0 \n",
"254 0 0 \n",
"255 1 0 \n",
"256 0 0 "
]
},
"execution_count": 74,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Teams dataframe\n",
"data_dir = './March Madness 2018/DataFiles/'\n",
"teams = pd.read_csv(data_dir + 'teams.csv')\n",
"teams.head()\n",
"\n",
"df_dummy = teams.rename(columns={'TeamID':'WTeamID'})\n",
"df_results = pd.merge(left=df_test, right=df_dummy, how='left', on=['WTeamID'])\n",
"\n",
"df_dummy = teams.rename(columns={'TeamID':'LTeamID'})\n",
"df_results = pd.merge(left=df_results, right=df_dummy, how='left', on=['LTeamID'])\n",
"\n",
"df_results = df_results.rename(columns={'TeamName_x':'WTeamName', 'TeamName_y':'LTeamName'})\n",
"df_results = df_results[['Season', 'DayNum', 'WTeamName', 'LTeamName', 'Composite_Diff', 'Prob', 'Pred', 'Result']]\n",
"df_results.drop_duplicates(subset=['Season','DayNum','WTeamName'], keep='last', inplace=True)\n",
"df_results.head()"
]
},
{
"cell_type": "code",
"execution_count": 75,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamName</th>\n",
" <th>LTeamName</th>\n",
" <th>Composite_Diff</th>\n",
" <th>Prob</th>\n",
" <th>Pred</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>499</th>\n",
" <td>2017</td>\n",
" <td>146</td>\n",
" <td>North Carolina</td>\n",
" <td>Kentucky</td>\n",
" <td>0.007129</td>\n",
" <td>0.515024</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>500</th>\n",
" <td>2017</td>\n",
" <td>146</td>\n",
" <td>South Carolina</td>\n",
" <td>Florida</td>\n",
" <td>0.091816</td>\n",
" <td>0.684429</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>498</th>\n",
" <td>2017</td>\n",
" <td>145</td>\n",
" <td>Oregon</td>\n",
" <td>Kansas</td>\n",
" <td>0.041307</td>\n",
" <td>0.586205</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>496</th>\n",
" <td>2017</td>\n",
" <td>144</td>\n",
" <td>South Carolina</td>\n",
" <td>Baylor</td>\n",
" <td>0.089609</td>\n",
" <td>0.680396</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>492</th>\n",
" <td>2017</td>\n",
" <td>143</td>\n",
" <td>Xavier</td>\n",
" <td>Arizona</td>\n",
" <td>0.098212</td>\n",
" <td>0.695959</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>484</th>\n",
" <td>2017</td>\n",
" <td>139</td>\n",
" <td>Michigan</td>\n",
" <td>Louisville</td>\n",
" <td>0.057800</td>\n",
" <td>0.619488</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>487</th>\n",
" <td>2017</td>\n",
" <td>139</td>\n",
" <td>South Carolina</td>\n",
" <td>Duke</td>\n",
" <td>0.121985</td>\n",
" <td>0.736642</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>479</th>\n",
" <td>2017</td>\n",
" <td>138</td>\n",
" <td>Wisconsin</td>\n",
" <td>Villanova</td>\n",
" <td>0.091102</td>\n",
" <td>0.683126</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>475</th>\n",
" <td>2017</td>\n",
" <td>138</td>\n",
" <td>Florida</td>\n",
" <td>Virginia</td>\n",
" <td>0.024714</td>\n",
" <td>0.551910</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>480</th>\n",
" <td>2017</td>\n",
" <td>138</td>\n",
" <td>Xavier</td>\n",
" <td>Florida St</td>\n",
" <td>0.031929</td>\n",
" <td>0.566904</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>457</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Arkansas</td>\n",
" <td>Seton Hall</td>\n",
" <td>0.001082</td>\n",
" <td>0.502280</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>465</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Michigan St</td>\n",
" <td>Miami FL</td>\n",
" <td>0.008130</td>\n",
" <td>0.517131</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>468</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Rhode Island</td>\n",
" <td>Creighton</td>\n",
" <td>0.044419</td>\n",
" <td>0.592556</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>471</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>USC</td>\n",
" <td>SMU</td>\n",
" <td>0.135217</td>\n",
" <td>0.757709</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>469</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>South Carolina</td>\n",
" <td>Marquette</td>\n",
" <td>0.008009</td>\n",
" <td>0.516877</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>447</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>MTSU</td>\n",
" <td>Minnesota</td>\n",
" <td>0.028861</td>\n",
" <td>0.560540</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>448</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Northwestern</td>\n",
" <td>Vanderbilt</td>\n",
" <td>0.019119</td>\n",
" <td>0.540216</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamName LTeamName Composite_Diff Prob \\\n",
"499 2017 146 North Carolina Kentucky 0.007129 0.515024 \n",
"500 2017 146 South Carolina Florida 0.091816 0.684429 \n",
"498 2017 145 Oregon Kansas 0.041307 0.586205 \n",
"496 2017 144 South Carolina Baylor 0.089609 0.680396 \n",
"492 2017 143 Xavier Arizona 0.098212 0.695959 \n",
"484 2017 139 Michigan Louisville 0.057800 0.619488 \n",
"487 2017 139 South Carolina Duke 0.121985 0.736642 \n",
"479 2017 138 Wisconsin Villanova 0.091102 0.683126 \n",
"475 2017 138 Florida Virginia 0.024714 0.551910 \n",
"480 2017 138 Xavier Florida St 0.031929 0.566904 \n",
"457 2017 137 Arkansas Seton Hall 0.001082 0.502280 \n",
"465 2017 137 Michigan St Miami FL 0.008130 0.517131 \n",
"468 2017 137 Rhode Island Creighton 0.044419 0.592556 \n",
"471 2017 137 USC SMU 0.135217 0.757709 \n",
"469 2017 137 South Carolina Marquette 0.008009 0.516877 \n",
"447 2017 136 MTSU Minnesota 0.028861 0.560540 \n",
"448 2017 136 Northwestern Vanderbilt 0.019119 0.540216 \n",
"\n",
" Pred Result \n",
"499 1 0 \n",
"500 1 0 \n",
"498 1 0 \n",
"496 1 0 \n",
"492 1 0 \n",
"484 1 0 \n",
"487 1 0 \n",
"479 1 0 \n",
"475 1 0 \n",
"480 1 0 \n",
"457 1 0 \n",
"465 1 0 \n",
"468 1 0 \n",
"471 1 0 \n",
"469 1 0 \n",
"447 1 0 \n",
"448 1 0 "
]
},
"execution_count": 75,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Wrong answers\n",
"incorrect = df_results.loc[df_results['Pred'] != df_results['Result']]\n",
"incorrect.sort_values(by='DayNum', ascending=False, inplace=True)\n",
"def get_incorrect_year(year):\n",
" incorrect_year = incorrect.loc[incorrect['Season'] == year]\n",
" return(incorrect_year)\n",
"get_incorrect_year(2017)"
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamName</th>\n",
" <th>LTeamName</th>\n",
" <th>Composite_Diff</th>\n",
" <th>Prob</th>\n",
" <th>Pred</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>452</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Villanova</td>\n",
" <td>Mt St Mary's</td>\n",
" <td>-0.553039</td>\n",
" <td>0.990653</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>466</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>North Carolina</td>\n",
" <td>TX Southern</td>\n",
" <td>-0.515507</td>\n",
" <td>0.987217</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>461</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Kansas</td>\n",
" <td>UC Davis</td>\n",
" <td>-0.513945</td>\n",
" <td>0.987050</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>463</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Louisville</td>\n",
" <td>Jacksonville St</td>\n",
" <td>-0.484377</td>\n",
" <td>0.983444</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>445</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Gonzaga</td>\n",
" <td>S Dakota St</td>\n",
" <td>-0.424736</td>\n",
" <td>0.972917</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>462</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Kentucky</td>\n",
" <td>N Kentucky</td>\n",
" <td>-0.419841</td>\n",
" <td>0.971808</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>441</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Arizona</td>\n",
" <td>North Dakota</td>\n",
" <td>-0.411020</td>\n",
" <td>0.969697</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>460</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Duke</td>\n",
" <td>Troy</td>\n",
" <td>-0.409618</td>\n",
" <td>0.969348</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>470</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>UCLA</td>\n",
" <td>Kent</td>\n",
" <td>-0.302583</td>\n",
" <td>0.927667</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>467</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Oregon</td>\n",
" <td>Iona</td>\n",
" <td>-0.295587</td>\n",
" <td>0.923608</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>442</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Butler</td>\n",
" <td>Winthrop</td>\n",
" <td>-0.262592</td>\n",
" <td>0.901517</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>454</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>West Virginia</td>\n",
" <td>Bucknell</td>\n",
" <td>-0.251144</td>\n",
" <td>0.892609</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>444</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Florida St</td>\n",
" <td>FL Gulf Coast</td>\n",
" <td>-0.247032</td>\n",
" <td>0.889239</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>458</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Baylor</td>\n",
" <td>New Mexico St</td>\n",
" <td>-0.246920</td>\n",
" <td>0.889146</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>443</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Florida</td>\n",
" <td>ETSU</td>\n",
" <td>-0.222845</td>\n",
" <td>0.867502</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>453</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Virginia</td>\n",
" <td>UNC Wilmington</td>\n",
" <td>-0.197225</td>\n",
" <td>0.840642</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>450</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Purdue</td>\n",
" <td>Vermont</td>\n",
" <td>-0.188957</td>\n",
" <td>0.831080</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>449</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Notre Dame</td>\n",
" <td>Princeton</td>\n",
" <td>-0.139029</td>\n",
" <td>0.763562</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>471</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>USC</td>\n",
" <td>SMU</td>\n",
" <td>0.135217</td>\n",
" <td>0.757709</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>446</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Iowa St</td>\n",
" <td>Nevada</td>\n",
" <td>-0.122716</td>\n",
" <td>0.737836</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>455</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Wisconsin</td>\n",
" <td>Virginia Tech</td>\n",
" <td>-0.099396</td>\n",
" <td>0.698068</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>472</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Wichita St</td>\n",
" <td>Dayton</td>\n",
" <td>-0.091455</td>\n",
" <td>0.683771</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>459</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Cincinnati</td>\n",
" <td>Kansas St</td>\n",
" <td>-0.064799</td>\n",
" <td>0.633297</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>451</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>St Mary's CA</td>\n",
" <td>VA Commonwealth</td>\n",
" <td>-0.054220</td>\n",
" <td>0.612347</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>468</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Rhode Island</td>\n",
" <td>Creighton</td>\n",
" <td>0.044419</td>\n",
" <td>0.592556</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>464</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Michigan</td>\n",
" <td>Oklahoma St</td>\n",
" <td>-0.035108</td>\n",
" <td>0.573473</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>447</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>MTSU</td>\n",
" <td>Minnesota</td>\n",
" <td>0.028861</td>\n",
" <td>0.560540</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>448</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Northwestern</td>\n",
" <td>Vanderbilt</td>\n",
" <td>0.019119</td>\n",
" <td>0.540216</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>465</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Michigan St</td>\n",
" <td>Miami FL</td>\n",
" <td>0.008130</td>\n",
" <td>0.517131</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>469</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>South Carolina</td>\n",
" <td>Marquette</td>\n",
" <td>0.008009</td>\n",
" <td>0.516877</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>456</th>\n",
" <td>2017</td>\n",
" <td>136</td>\n",
" <td>Xavier</td>\n",
" <td>Maryland</td>\n",
" <td>-0.006985</td>\n",
" <td>0.514721</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>457</th>\n",
" <td>2017</td>\n",
" <td>137</td>\n",
" <td>Arkansas</td>\n",
" <td>Seton Hall</td>\n",
" <td>0.001082</td>\n",
" <td>0.502280</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamName LTeamName Composite_Diff \\\n",
"452 2017 136 Villanova Mt St Mary's -0.553039 \n",
"466 2017 137 North Carolina TX Southern -0.515507 \n",
"461 2017 137 Kansas UC Davis -0.513945 \n",
"463 2017 137 Louisville Jacksonville St -0.484377 \n",
"445 2017 136 Gonzaga S Dakota St -0.424736 \n",
"462 2017 137 Kentucky N Kentucky -0.419841 \n",
"441 2017 136 Arizona North Dakota -0.411020 \n",
"460 2017 137 Duke Troy -0.409618 \n",
"470 2017 137 UCLA Kent -0.302583 \n",
"467 2017 137 Oregon Iona -0.295587 \n",
"442 2017 136 Butler Winthrop -0.262592 \n",
"454 2017 136 West Virginia Bucknell -0.251144 \n",
"444 2017 136 Florida St FL Gulf Coast -0.247032 \n",
"458 2017 137 Baylor New Mexico St -0.246920 \n",
"443 2017 136 Florida ETSU -0.222845 \n",
"453 2017 136 Virginia UNC Wilmington -0.197225 \n",
"450 2017 136 Purdue Vermont -0.188957 \n",
"449 2017 136 Notre Dame Princeton -0.139029 \n",
"471 2017 137 USC SMU 0.135217 \n",
"446 2017 136 Iowa St Nevada -0.122716 \n",
"455 2017 136 Wisconsin Virginia Tech -0.099396 \n",
"472 2017 137 Wichita St Dayton -0.091455 \n",
"459 2017 137 Cincinnati Kansas St -0.064799 \n",
"451 2017 136 St Mary's CA VA Commonwealth -0.054220 \n",
"468 2017 137 Rhode Island Creighton 0.044419 \n",
"464 2017 137 Michigan Oklahoma St -0.035108 \n",
"447 2017 136 MTSU Minnesota 0.028861 \n",
"448 2017 136 Northwestern Vanderbilt 0.019119 \n",
"465 2017 137 Michigan St Miami FL 0.008130 \n",
"469 2017 137 South Carolina Marquette 0.008009 \n",
"456 2017 136 Xavier Maryland -0.006985 \n",
"457 2017 137 Arkansas Seton Hall 0.001082 \n",
"\n",
" Prob Pred Result \n",
"452 0.990653 0 0 \n",
"466 0.987217 0 0 \n",
"461 0.987050 0 0 \n",
"463 0.983444 0 0 \n",
"445 0.972917 0 0 \n",
"462 0.971808 0 0 \n",
"441 0.969697 0 0 \n",
"460 0.969348 0 0 \n",
"470 0.927667 0 0 \n",
"467 0.923608 0 0 \n",
"442 0.901517 0 0 \n",
"454 0.892609 0 0 \n",
"444 0.889239 0 0 \n",
"458 0.889146 0 0 \n",
"443 0.867502 0 0 \n",
"453 0.840642 0 0 \n",
"450 0.831080 0 0 \n",
"449 0.763562 0 0 \n",
"471 0.757709 1 0 \n",
"446 0.737836 0 0 \n",
"455 0.698068 0 0 \n",
"472 0.683771 0 0 \n",
"459 0.633297 0 0 \n",
"451 0.612347 0 0 \n",
"468 0.592556 1 0 \n",
"464 0.573473 0 0 \n",
"447 0.560540 1 0 \n",
"448 0.540216 1 0 \n",
"465 0.517131 1 0 \n",
"469 0.516877 1 0 \n",
"456 0.514721 0 0 \n",
"457 0.502280 1 0 "
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# First round 2017\n",
"def get_firstround_year(year):\n",
" first_round = df_results.loc[(df_results['DayNum'] <= 137) & (df_results['Season'] == year)] \n",
" first_round.sort_values(by='Prob', ascending=False, inplace=True)\n",
" return(first_round)\n",
"get_firstround_year(2017)"
]
},
{
"cell_type": "code",
"execution_count": 77,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>TeamName</th>\n",
" <th>Composite Score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2964</th>\n",
" <td>2009</td>\n",
" <td>North Carolina</td>\n",
" <td>0.999286</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2963</th>\n",
" <td>2008</td>\n",
" <td>North Carolina</td>\n",
" <td>0.995707</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1100</th>\n",
" <td>2006</td>\n",
" <td>Duke</td>\n",
" <td>0.993241</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1966</th>\n",
" <td>2011</td>\n",
" <td>Kansas</td>\n",
" <td>0.990613</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4746</th>\n",
" <td>2017</td>\n",
" <td>Villanova</td>\n",
" <td>0.987834</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1098</th>\n",
" <td>2004</td>\n",
" <td>Duke</td>\n",
" <td>0.987728</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1097</th>\n",
" <td>2003</td>\n",
" <td>Duke</td>\n",
" <td>0.986729</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2027</th>\n",
" <td>2015</td>\n",
" <td>Kentucky</td>\n",
" <td>0.985544</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1965</th>\n",
" <td>2010</td>\n",
" <td>Kansas</td>\n",
" <td>0.983324</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1105</th>\n",
" <td>2011</td>\n",
" <td>Duke</td>\n",
" <td>0.982254</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1099</th>\n",
" <td>2005</td>\n",
" <td>Duke</td>\n",
" <td>0.981524</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1963</th>\n",
" <td>2008</td>\n",
" <td>Kansas</td>\n",
" <td>0.978791</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1761</th>\n",
" <td>2005</td>\n",
" <td>Illinois</td>\n",
" <td>0.977183</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1967</th>\n",
" <td>2012</td>\n",
" <td>Kansas</td>\n",
" <td>0.976465</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1104</th>\n",
" <td>2010</td>\n",
" <td>Duke</td>\n",
" <td>0.975412</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2962</th>\n",
" <td>2007</td>\n",
" <td>North Carolina</td>\n",
" <td>0.975156</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2024</th>\n",
" <td>2012</td>\n",
" <td>Kentucky</td>\n",
" <td>0.975007</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1322</th>\n",
" <td>2014</td>\n",
" <td>Florida</td>\n",
" <td>0.974509</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1971</th>\n",
" <td>2016</td>\n",
" <td>Kansas</td>\n",
" <td>0.972754</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2189</th>\n",
" <td>2014</td>\n",
" <td>Louisville</td>\n",
" <td>0.972702</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season TeamName Composite Score\n",
"2964 2009 North Carolina 0.999286\n",
"2963 2008 North Carolina 0.995707\n",
"1100 2006 Duke 0.993241\n",
"1966 2011 Kansas 0.990613\n",
"4746 2017 Villanova 0.987834\n",
"1098 2004 Duke 0.987728\n",
"1097 2003 Duke 0.986729\n",
"2027 2015 Kentucky 0.985544\n",
"1965 2010 Kansas 0.983324\n",
"1105 2011 Duke 0.982254\n",
"1099 2005 Duke 0.981524\n",
"1963 2008 Kansas 0.978791\n",
"1761 2005 Illinois 0.977183\n",
"1967 2012 Kansas 0.976465\n",
"1104 2010 Duke 0.975412\n",
"2962 2007 North Carolina 0.975156\n",
"2024 2012 Kentucky 0.975007\n",
"1322 2014 Florida 0.974509\n",
"1971 2016 Kansas 0.972754\n",
"2189 2014 Louisville 0.972702"
]
},
"execution_count": 77,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Best composite scoring teams ever\n",
"df_dummy = teams.rename(columns={'TeamID':'Team_ID'})\n",
"df_scores = pd.merge(left=df, right=df_dummy, how='left', on=['Team_ID'])\n",
"df_scores = df_scores[['Season', 'TeamName', 'Composite Score']]\n",
"df_scores.sort_values(by='Composite Score', ascending=False, inplace=True)\n",
"df_scores.head(20)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Stage 1 Submission"
]
},
{
"cell_type": "code",
"execution_count": 78,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014_1107_1110</td>\n",
" <td>0.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014_1107_1112</td>\n",
" <td>0.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2014_1107_1113</td>\n",
" <td>0.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2014_1107_1124</td>\n",
" <td>0.5</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2014_1107_1140</td>\n",
" <td>0.5</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Pred\n",
"0 2014_1107_1110 0.5\n",
"1 2014_1107_1112 0.5\n",
"2 2014_1107_1113 0.5\n",
"3 2014_1107_1124 0.5\n",
"4 2014_1107_1140 0.5"
]
},
"execution_count": 78,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Sample submission dataframe\n",
"data_dir = './March Madness 2018/'\n",
"sample = pd.read_csv(data_dir + 'SampleSubmissionStage1.csv')\n",
"sample.head()"
]
},
{
"cell_type": "code",
"execution_count": 79,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Pred</th>\n",
" <th>Season</th>\n",
" <th>Team_ID_Low</th>\n",
" <th>Team_ID_High</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014_1107_1110</td>\n",
" <td>0.5</td>\n",
" <td>2014</td>\n",
" <td>1107</td>\n",
" <td>1110</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014_1107_1112</td>\n",
" <td>0.5</td>\n",
" <td>2014</td>\n",
" <td>1107</td>\n",
" <td>1112</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2014_1107_1113</td>\n",
" <td>0.5</td>\n",
" <td>2014</td>\n",
" <td>1107</td>\n",
" <td>1113</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2014_1107_1124</td>\n",
" <td>0.5</td>\n",
" <td>2014</td>\n",
" <td>1107</td>\n",
" <td>1124</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2014_1107_1140</td>\n",
" <td>0.5</td>\n",
" <td>2014</td>\n",
" <td>1107</td>\n",
" <td>1140</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Pred Season Team_ID_Low Team_ID_High\n",
"0 2014_1107_1110 0.5 2014 1107 1110\n",
"1 2014_1107_1112 0.5 2014 1107 1112\n",
"2 2014_1107_1113 0.5 2014 1107 1113\n",
"3 2014_1107_1124 0.5 2014 1107 1124\n",
"4 2014_1107_1140 0.5 2014 1107 1140"
]
},
"execution_count": 79,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Pull relevant information from ID\n",
"sample['Season'] = sample.apply(lambda row: row['ID'][0:4], axis=1)\n",
"sample['Team_ID_Low'] = sample.apply(lambda row: row['ID'][5:9], axis=1)\n",
"sample['Team_ID_High'] = sample.apply(lambda row: row['ID'][10:14], axis=1)\n",
"sample.head()"
]
},
{
"cell_type": "code",
"execution_count": 80,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Merge composite scores\n",
"df['Season'] = df['Season'].astype(str)\n",
"df['Team_ID'] = df['Team_ID'].astype(str)\n",
"\n",
"df_lows = df.rename(columns={'Composite Score':'Score', 'Team_ID':'Team_ID_Low'})\n",
"df_highs = df.rename(columns={'Composite Score':'Score', 'Team_ID':'Team_ID_High'})\n",
"\n",
"df_dummy = pd.merge(left=sample, right=df_lows, how='left', on=['Season', 'Team_ID_Low'])\n",
"df_concat = pd.merge(left=df_dummy, right=df_highs, on=['Season', 'Team_ID_High'])\n",
"df_sample = df_concat.rename(columns={'Score_x':'Score_Low', 'Score_y':'Score_High'})\n",
"df_sample['Score_Diff'] = df_sample['Score_Low'] - df_sample['Score_High']\n",
"df_full = df_sample\n",
"df_sample = df_sample[['ID', 'Score_Low', 'Score_High', 'Score_Diff','Pred']]"
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Score_Low</th>\n",
" <th>Score_High</th>\n",
" <th>Score_Diff</th>\n",
" <th>Pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014_1107_1110</td>\n",
" <td>0.490968</td>\n",
" <td>0.589957</td>\n",
" <td>-0.098989</td>\n",
" <td>0.302656</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014_1107_1112</td>\n",
" <td>0.490968</td>\n",
" <td>0.936298</td>\n",
" <td>-0.445330</td>\n",
" <td>0.022864</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2014_1110_1112</td>\n",
" <td>0.589957</td>\n",
" <td>0.936298</td>\n",
" <td>-0.346341</td>\n",
" <td>0.051156</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2014_1107_1113</td>\n",
" <td>0.490968</td>\n",
" <td>0.781062</td>\n",
" <td>-0.290094</td>\n",
" <td>0.079725</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2014_1110_1113</td>\n",
" <td>0.589957</td>\n",
" <td>0.781062</td>\n",
" <td>-0.191105</td>\n",
" <td>0.166392</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Score_Low Score_High Score_Diff Pred\n",
"0 2014_1107_1110 0.490968 0.589957 -0.098989 0.302656\n",
"1 2014_1107_1112 0.490968 0.936298 -0.445330 0.022864\n",
"2 2014_1110_1112 0.589957 0.936298 -0.346341 0.051156\n",
"3 2014_1107_1113 0.490968 0.781062 -0.290094 0.079725\n",
"4 2014_1110_1113 0.589957 0.781062 -0.191105 0.166392"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Probabilities\n",
"diffs = df_sample['Score_Diff'].values.reshape(-1,1)\n",
"probs = clf3.predict_proba(diffs)\n",
"Y_prob = [item[1] for item in probs]\n",
"df_sample['Pred'] = Y_prob\n",
"df_sample.head()"
]
},
{
"cell_type": "code",
"execution_count": 82,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2014_1107_1110</td>\n",
" <td>0.302656</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2014_1107_1112</td>\n",
" <td>0.022864</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2014_1110_1112</td>\n",
" <td>0.051156</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2014_1107_1113</td>\n",
" <td>0.079725</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2014_1110_1113</td>\n",
" <td>0.166392</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Pred\n",
"0 2014_1107_1110 0.302656\n",
"1 2014_1107_1112 0.022864\n",
"2 2014_1110_1112 0.051156\n",
"3 2014_1107_1113 0.079725\n",
"4 2014_1110_1113 0.166392"
]
},
"execution_count": 82,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Submission\n",
"df_submission = df_sample[['ID', 'Pred']]\n",
"df_submission.head()"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Write to csv\n",
"df_submission.to_csv('stage1_submission.csv', index=None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reformat Data"
]
},
{
"cell_type": "code",
"execution_count": 84,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>W_Composite</th>\n",
" <th>L_Composite</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>134</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>0.319824</td>\n",
" <td>0.301622</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1112</td>\n",
" <td>1436</td>\n",
" <td>0.963552</td>\n",
" <td>0.508381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1113</td>\n",
" <td>1272</td>\n",
" <td>0.825212</td>\n",
" <td>0.853798</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1141</td>\n",
" <td>1166</td>\n",
" <td>0.756841</td>\n",
" <td>0.851419</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1143</td>\n",
" <td>1301</td>\n",
" <td>0.840379</td>\n",
" <td>0.819419</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID W_Composite L_Composite\n",
"0 2003 134 1421 1411 0.319824 0.301622\n",
"1 2003 136 1112 1436 0.963552 0.508381\n",
"2 2003 136 1113 1272 0.825212 0.853798\n",
"3 2003 136 1141 1166 0.756841 0.851419\n",
"4 2003 136 1143 1301 0.840379 0.819419"
]
},
"execution_count": 84,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"season_elos = season_elos.rename(columns={'team_id':'Team_ID', 'season':'Season', 'season_elo':'Elo'}) \n",
"df = pd.merge(left=season_elos, right=df_topranks, how='left', on=['Season', 'Team_ID'])\n",
"df.dropna(inplace=True)\n",
"scaler = preprocessing.MinMaxScaler(feature_range=(0,1))\n",
"df['Elo_Scaled'] = scaler.fit_transform(df['Elo'].values.reshape(-1,1))\n",
"df['MeanRank_Scaled'] = 1 - scaler.fit_transform(df['MeanRank'].values.reshape(-1,1))\n",
"df['Composite Score'] = (df['Elo_Scaled'] + df['MeanRank_Scaled']) / 2\n",
"\n",
"data_dir = './March Madness 2018/DataFiles/'\n",
"df_tour = pd.read_csv(data_dir + 'NCAATourneyCompactResults.csv')\n",
"df_tour.drop(labels=['WLoc', 'NumOT', 'WScore', 'LScore'], inplace=True, axis=1)\n",
"df.drop(labels=['Elo', 'season_elo', 'MeanRank'], inplace=True, axis=1)\n",
"\n",
"df_win_elos = df.rename(columns={'Team_ID':'WTeamID', 'Composite Score':'W_Composite'})\n",
"df_loss_elos = df.rename(columns={'Team_ID':'LTeamID', 'Composite Score':'L_Composite'}) \n",
"df_dummy = pd.merge(left=df_tour, right=df_win_elos, how='left', on=['Season', 'WTeamID'])\n",
"df_concat = pd.merge(left=df_dummy, right=df_loss_elos, on=['Season', 'LTeamID'])\n",
"\n",
"df_total = df_concat[['Season', 'DayNum', 'WTeamID', 'LTeamID', 'W_Composite', 'L_Composite']]\n",
"df_total.head()"
]
},
{
"cell_type": "code",
"execution_count": 85,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>LTeamID</th>\n",
" <th>TeamID_Upper</th>\n",
" <th>TeamID_Lower</th>\n",
" <th>Composite_Upper</th>\n",
" <th>Composite_Lower</th>\n",
" <th>Composite_Diff</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>134</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>0.319824</td>\n",
" <td>0.301622</td>\n",
" <td>-0.018201</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1112</td>\n",
" <td>1436</td>\n",
" <td>1436</td>\n",
" <td>1112</td>\n",
" <td>0.963552</td>\n",
" <td>0.508381</td>\n",
" <td>-0.455171</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1113</td>\n",
" <td>1272</td>\n",
" <td>1272</td>\n",
" <td>1113</td>\n",
" <td>0.853798</td>\n",
" <td>0.825212</td>\n",
" <td>-0.028586</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1141</td>\n",
" <td>1166</td>\n",
" <td>1166</td>\n",
" <td>1141</td>\n",
" <td>0.851419</td>\n",
" <td>0.756841</td>\n",
" <td>-0.094578</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1143</td>\n",
" <td>1301</td>\n",
" <td>1301</td>\n",
" <td>1143</td>\n",
" <td>0.840379</td>\n",
" <td>0.819419</td>\n",
" <td>-0.020960</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID LTeamID TeamID_Upper TeamID_Lower \\\n",
"0 2003 134 1421 1411 1421 1411 \n",
"1 2003 136 1112 1436 1436 1112 \n",
"2 2003 136 1113 1272 1272 1113 \n",
"3 2003 136 1141 1166 1166 1141 \n",
"4 2003 136 1143 1301 1301 1143 \n",
"\n",
" Composite_Upper Composite_Lower Composite_Diff \n",
"0 0.319824 0.301622 -0.018201 \n",
"1 0.963552 0.508381 -0.455171 \n",
"2 0.853798 0.825212 -0.028586 \n",
"3 0.851419 0.756841 -0.094578 \n",
"4 0.840379 0.819419 -0.020960 "
]
},
"execution_count": 85,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_total['TeamID_Upper'] = np.where(df_total['WTeamID'] >= df_total['LTeamID'], df_total['WTeamID'], df_total['LTeamID'])\n",
"df_total['TeamID_Lower'] = np.where(df_total['LTeamID'] >= df_total['WTeamID'], df_total['WTeamID'], df_total['LTeamID'])\n",
"\n",
"df_total['Composite_Upper'] = np.where(df_total['W_Composite'] >= df_total['L_Composite'], df_total['W_Composite'], df_total['L_Composite'])\n",
"df_total['Composite_Lower'] = np.where(df_total['L_Composite'] >= df_total['W_Composite'], df_total['W_Composite'], df_total['L_Composite'])\n",
"\n",
"df_total['Composite_Diff'] = df_total['Composite_Lower'] - df_total['Composite_Upper']\n",
"df_total = df_total[['Season', 'DayNum', 'WTeamID', 'LTeamID', 'TeamID_Upper', 'TeamID_Lower', 'Composite_Upper', 'Composite_Lower', 'Composite_Diff']]\n",
"\n",
"df_total.head()"
]
},
{
"cell_type": "code",
"execution_count": 86,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>TeamID_Upper</th>\n",
" <th>TeamID_Lower</th>\n",
" <th>Composite_Upper</th>\n",
" <th>Composite_Lower</th>\n",
" <th>Composite_Diff</th>\n",
" <th>Result</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2003</td>\n",
" <td>134</td>\n",
" <td>1421</td>\n",
" <td>1411</td>\n",
" <td>0.319824</td>\n",
" <td>0.301622</td>\n",
" <td>-0.018201</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1436</td>\n",
" <td>1112</td>\n",
" <td>0.963552</td>\n",
" <td>0.508381</td>\n",
" <td>-0.455171</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1272</td>\n",
" <td>1113</td>\n",
" <td>0.853798</td>\n",
" <td>0.825212</td>\n",
" <td>-0.028586</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1166</td>\n",
" <td>1141</td>\n",
" <td>0.851419</td>\n",
" <td>0.756841</td>\n",
" <td>-0.094578</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2003</td>\n",
" <td>136</td>\n",
" <td>1301</td>\n",
" <td>1143</td>\n",
" <td>0.840379</td>\n",
" <td>0.819419</td>\n",
" <td>-0.020960</td>\n",
" <td>1</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum TeamID_Upper TeamID_Lower Composite_Upper \\\n",
"0 2003 134 1421 1411 0.319824 \n",
"1 2003 136 1436 1112 0.963552 \n",
"2 2003 136 1272 1113 0.853798 \n",
"3 2003 136 1166 1141 0.851419 \n",
"4 2003 136 1301 1143 0.840379 \n",
"\n",
" Composite_Lower Composite_Diff Result \n",
"0 0.301622 -0.018201 0 \n",
"1 0.508381 -0.455171 1 \n",
"2 0.825212 -0.028586 1 \n",
"3 0.756841 -0.094578 1 \n",
"4 0.819419 -0.020960 1 "
]
},
"execution_count": 86,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_total['Result'] = np.where(df_total['WTeamID'] == df_total['TeamID_Lower'], 1, 0)\n",
"df_predictions = df_total.drop(['WTeamID', 'LTeamID'], axis=1)\n",
"df_predictions.head()"
]
},
{
"cell_type": "code",
"execution_count": 87,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.69278746859047591"
]
},
"execution_count": 87,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_train = df_predictions.loc[df_predictions['Season'] < 2014]\n",
"df_test = df_predictions.loc[df_predictions['Season'] >= 2014]\n",
"\n",
"X_train = df_train['Composite_Diff'].values.reshape(-1,1)\n",
"Y_train = df_train['Result'].values\n",
"\n",
"X_test = df_test['Composite_Diff'].values.reshape(-1,1)\n",
"Y_test = df_test['Result'].values\n",
"\n",
"logreg = LogisticRegression()\n",
"params = {'C': np.logspace(start=-5, stop=5, num=10)}\n",
"clf3 = GridSearchCV(logreg, params, scoring='neg_log_loss', refit=True)\n",
"clf3.fit(X_train, Y_train)\n",
"clf3.score(X_train, Y_train)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Simplified Elements From Above"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_teams_df(year):\n",
" \n",
" # Get all teams for all seasons\n",
" stage2_dir = './March Madness 2018/Stage2UpdatedDataFiles/'\n",
" df = pd.read_csv(stage2_dir + 'RegularSeasonCompactResults.csv')\n",
"\n",
" df = df.loc[df['Season'] == year]\n",
" team_ids = set(df.WTeamID).union(set(df.LTeamID))\n",
" team_list = list(team_ids)\n",
" teams = pd.DataFrame({'Team_ID':team_list})\n",
" teams['Season'] = year\n",
" teams = teams[['Season', 'Team_ID']]\n",
" return(teams)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_team_name(id):\n",
" \n",
" # Get school name for a given team id in 2018\n",
" stage2_dir = './March Madness 2018/Stage2UpdatedDataFiles/'\n",
" teams = pd.read_csv(stage2_dir + 'teams.csv')\n",
" name = teams.loc[teams['TeamID'] == id]['TeamName']\n",
" return(name.values[0])"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_team_id(name):\n",
" \n",
" # Get school name for a given team id in 2018\n",
" stage2_dir = './March Madness 2018/Stage2UpdatedDataFiles/'\n",
" teams = pd.read_csv(stage2_dir + 'teams.csv')\n",
" id = teams.loc[teams['TeamName'] == name]['TeamID']\n",
" return(id.values[0])"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def elo_pred(elo1, elo2):\n",
" return(1. / (10. ** (-(elo1 - elo2) / 400.) + 1.))\n",
"\n",
"def expected_margin(elo_diff):\n",
" return((7.5 + 0.006 * elo_diff))\n",
"\n",
"def elo_update(w_elo, l_elo, margin, K):\n",
" elo_diff = w_elo - l_elo\n",
" pred = elo_pred(w_elo, l_elo)\n",
" mult = ((margin + 3.) ** 0.8) / expected_margin(elo_diff)\n",
" update = K * mult * (1 - pred)\n",
" return(pred, update)\n",
"\n",
"def final_elo_per_season(df, team_id):\n",
" d = df.copy()\n",
" d = d.loc[(d.WTeamID == team_id) | (d.LTeamID == team_id), :]\n",
" d.sort_values(['Season', 'DayNum'], inplace=True)\n",
" d.drop_duplicates(['Season'], keep='last', inplace=True)\n",
" w_mask = d.WTeamID == team_id\n",
" l_mask = d.LTeamID == team_id\n",
" d['season_elo'] = None\n",
" d.loc[w_mask, 'season_elo'] = d.loc[w_mask, 'w_elo']\n",
" d.loc[l_mask, 'season_elo'] = d.loc[l_mask, 'l_elo']\n",
" out = pd.DataFrame({\n",
" 'team_id': team_id,\n",
" 'season': d.Season,\n",
" 'season_elo': d.season_elo\n",
" })\n",
" return(out)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_elos_df(year):\n",
" \n",
" # Data\n",
" stage2_dir = './March Madness 2018/Stage2UpdatedDataFiles/'\n",
" df = pd.read_csv(stage2_dir + 'RegularSeasonCompactResults.csv')\n",
" \n",
" # Constants\n",
" HOME_ADVANTAGE = 100 \n",
" K = 22\n",
" rs = df.loc[df['Season'] == year]\n",
" rs.reset_index(inplace = True)\n",
" \n",
" # Dictionary for lookups\n",
" team_ids = set(rs.WTeamID).union(set(rs.LTeamID))\n",
" elo_dict = dict(zip(list(team_ids), [1500] * len(team_ids)))\n",
"\n",
" # Set up columns\n",
" rs['margin'] = rs.WScore - rs.LScore\n",
" rs['w_elo'] = None\n",
" rs['l_elo'] = None\n",
" \n",
" # Iterate through regular season\n",
" preds = []\n",
" for i in range(rs.shape[0]):\n",
"\n",
" # Get key data from current row\n",
" w = rs.at[i, 'WTeamID']\n",
" l = rs.at[i, 'LTeamID']\n",
" margin = rs.at[i, 'margin']\n",
" wloc = rs.at[i, 'WLoc']\n",
"\n",
" # Does either team get a home-court advantage?\n",
" w_ad, l_ad, = 0., 0.\n",
" if wloc == \"H\":\n",
" w_ad += HOME_ADVANTAGE\n",
" elif wloc == \"A\":\n",
" l_ad += HOME_ADVANTAGE\n",
"\n",
" # Get elo updates as a result of the game\n",
" pred, update = elo_update(elo_dict[w] + w_ad,\n",
" elo_dict[l] + l_ad, \n",
" margin, K)\n",
" elo_dict[w] += update\n",
" elo_dict[l] -= update\n",
" preds.append(pred)\n",
"\n",
" # Stores new elos in the games dataframe\n",
" rs.loc[i, 'w_elo'] = elo_dict[w]\n",
" rs.loc[i, 'l_elo'] = elo_dict[l]\n",
" \n",
" # Create and return final elo dataframe\n",
" df_list = [final_elo_per_season(rs, i) for i in team_ids]\n",
" season_elos = pd.concat(df_list)\n",
" season_elos.rename(columns={'season':'Season', 'team_id':'Team_ID', 'season_elo':'Elo'}, inplace = True)\n",
" return(season_elos)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_elo_score(elos_df, year, team_id):\n",
" \n",
" # Return final elo for a team in a given year\n",
" score = elos_df.loc[(elos_df['season'] == year) & (elos_df['team_id'] == team_id)]['season_elo']\n",
" return(score)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_select_ranks_df(year, day):\n",
"\n",
" # Get select ranking scores dataframe\n",
" data_dir = './March Madness 2018/'\n",
" df = pd.read_csv(data_dir + 'MasseyOrdinals_thruSeason2018_Day128.csv')\n",
"\n",
" # Get final day\n",
" data_dir = './March Madness 2018/'\n",
" df2 = pd.read_csv(data_dir + 'MasseyOrdinals_2018_133_only_53Systems.csv')\n",
" df = df.append(df2)\n",
" \n",
" # Set up\n",
" teams = get_teams_df(year)\n",
" df_massey = df.loc[df['Season'] == 2018]\n",
"\n",
" df_temp = df_massey.loc[(df_massey['RankingDayNum'] == day) & (df_massey['SystemName'] == 'SAG')]\n",
" df_temp = df_temp.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
" df_temp.rename(columns={'OrdinalRank':'SAG', 'TeamID':'Team_ID'}, inplace=True)\n",
"\n",
" df_temp2 = df_massey.loc[(df_massey['RankingDayNum'] == day) & (df_massey['SystemName'] == 'WLK')]\n",
" df_temp2 = df_temp2.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
" df_temp2.rename(columns={'OrdinalRank':'WLK', 'TeamID':'Team_ID'}, inplace=True)\n",
"\n",
" df_temp3 = df_massey.loc[(df_massey['RankingDayNum'] == day) & (df_massey['SystemName'] == 'POM')]\n",
" df_temp3 = df_temp3.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
" df_temp3.rename(columns={'OrdinalRank':'POM', 'TeamID':'Team_ID'}, inplace=True)\n",
"\n",
" df_temp4 = df_massey.loc[(df_massey['RankingDayNum'] == day) & (df_massey['SystemName'] == 'MOR')]\n",
" df_temp4 = df_temp4.drop(labels=['RankingDayNum', 'SystemName'], axis=1)\n",
" df_temp4.rename(columns={'OrdinalRank':'MOR', 'TeamID':'Team_ID'}, inplace=True)\n",
"\n",
" teams = pd.merge(left=teams, right=df_temp, how='left', on=['Season', 'Team_ID'])\n",
" teams = pd.merge(left=teams, right=df_temp2, how='left', on=['Season', 'Team_ID'])\n",
" teams = pd.merge(left=teams, right=df_temp3, how='left', on=['Season', 'Team_ID'])\n",
" teams = pd.merge(left=teams, right=df_temp4, how='left', on=['Season', 'Team_ID'])\n",
" \n",
" # Calculate mean score\n",
" teams['MeanRank'] = (teams['SAG'] + teams['WLK'] + teams['POM'] + teams['MOR']) / 4\n",
" teams.dropna(inplace = True)\n",
" massey_df = teams\n",
" return(massey_df)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_select_rank(massey_df, year, day, team_id):\n",
" score = massey_df.loc[(massey_df['Season'] == year) & (massey_df['Team_ID'] == team_id)]['MeanRank']\n",
" return(score)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def get_composite_scores_df(year):\n",
" \n",
" # Get dataframe with composite scores for all teams\n",
" FINAL_DAY = 133\n",
" df = get_teams_df(year)\n",
" ranks = get_select_ranks_df(year, FINAL_DAY)\n",
" season_elos = get_elos_df(year)\n",
"\n",
" df = pd.merge(left=df, right=season_elos, how='left', on=['Season', 'Team_ID'])\n",
" df = pd.merge(left=df, right=ranks, how='left', on=['Season', 'Team_ID'])\n",
" df = df[['Season', 'Team_ID', 'MeanRank', 'Elo']]\n",
"\n",
" # Normalize features\n",
" scaler = preprocessing.MinMaxScaler(feature_range=(0,1))\n",
" df['Elo_Scaled'] = scaler.fit_transform(df['Elo'].values.reshape(-1,1))\n",
" df['MeanRank_Scaled'] = 1 - scaler.fit_transform(df['MeanRank'].values.reshape(-1,1))\n",
"\n",
" # Average rankings\n",
" df['Composite Score'] = (df['Elo_Scaled'] + (2 * df['MeanRank_Scaled'])) / 3\n",
" df = df[['Season', 'Team_ID', 'Composite Score']]\n",
" final_scores = df\n",
" return(final_scores)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"def generate_probs(model, year):\n",
" data_dir = './March Madness 2018/'\n",
" sample = pd.read_csv(data_dir + 'SampleSubmissionStage2.csv')\n",
"\n",
" sample['Season'] = sample.apply(lambda row: row['ID'][0:4], axis=1)\n",
" sample['Team_ID_Low'] = sample.apply(lambda row: row['ID'][5:9], axis=1)\n",
" sample['Team_ID_High'] = sample.apply(lambda row: row['ID'][10:14], axis=1)\n",
" sample.head()\n",
"\n",
" df = get_composite_scores_df(year)\n",
" df = final_scores\n",
" df['Season'] = df['Season'].astype(str)\n",
" df['Team_ID'] = df['Team_ID'].astype(str)\n",
"\n",
" df_lows = df.rename(columns={'Composite Score':'Score', 'Team_ID':'Team_ID_Low'})\n",
" df_highs = df.rename(columns={'Composite Score':'Score', 'Team_ID':'Team_ID_High'})\n",
"\n",
" df_dummy = pd.merge(left=sample, right=df_lows, how='left', on=['Season', 'Team_ID_Low'])\n",
" df_concat = pd.merge(left=df_dummy, right=df_highs, on=['Season', 'Team_ID_High'])\n",
" df_sample = df_concat.rename(columns={'Score_x':'Score_Low', 'Score_y':'Score_High'})\n",
" df_sample['Score_Diff'] = df_sample['Score_Low'] - df_sample['Score_High']\n",
" df_full = df_sample\n",
" df_sample = df_sample[['ID', 'Score_Low', 'Score_High', 'Score_Diff','Pred']]\n",
"\n",
" diffs = df_sample['Score_Diff'].values.reshape(-1,1)\n",
" probs = model.predict_proba(diffs)\n",
" Y_prob = [item[1] for item in probs]\n",
" df_sample['Pred'] = Y_prob\n",
" df_sample = df_sample[['ID', 'Pred']]\n",
" return(df_sample)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 2018 Results EDA"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>DayNum</th>\n",
" <th>WTeamID</th>\n",
" <th>WScore</th>\n",
" <th>LTeamID</th>\n",
" <th>LScore</th>\n",
" <th>WLoc</th>\n",
" <th>NumOT</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1985</td>\n",
" <td>20</td>\n",
" <td>1228</td>\n",
" <td>81</td>\n",
" <td>1328</td>\n",
" <td>64</td>\n",
" <td>N</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1985</td>\n",
" <td>25</td>\n",
" <td>1106</td>\n",
" <td>77</td>\n",
" <td>1354</td>\n",
" <td>70</td>\n",
" <td>H</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>1985</td>\n",
" <td>25</td>\n",
" <td>1112</td>\n",
" <td>63</td>\n",
" <td>1223</td>\n",
" <td>56</td>\n",
" <td>H</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1985</td>\n",
" <td>25</td>\n",
" <td>1165</td>\n",
" <td>70</td>\n",
" <td>1432</td>\n",
" <td>54</td>\n",
" <td>H</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>1985</td>\n",
" <td>25</td>\n",
" <td>1192</td>\n",
" <td>86</td>\n",
" <td>1447</td>\n",
" <td>74</td>\n",
" <td>H</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season DayNum WTeamID WScore LTeamID LScore WLoc NumOT\n",
"0 1985 20 1228 81 1328 64 N 0\n",
"1 1985 25 1106 77 1354 70 H 0\n",
"2 1985 25 1112 63 1223 56 H 0\n",
"3 1985 25 1165 70 1432 54 H 0\n",
"4 1985 25 1192 86 1447 74 H 0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Import data\n",
"stage2_dir = './March Madness 2018/Stage2UpdatedDataFiles/'\n",
"df = pd.read_csv(stage2_dir + 'RegularSeasonCompactResults.csv')\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>Team_ID</th>\n",
" <th>Composite Score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2018</td>\n",
" <td>1101</td>\n",
" <td>0.295815</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2018</td>\n",
" <td>1102</td>\n",
" <td>0.328263</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2018</td>\n",
" <td>1103</td>\n",
" <td>0.292680</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2018</td>\n",
" <td>1104</td>\n",
" <td>0.780949</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2018</td>\n",
" <td>1105</td>\n",
" <td>0.000265</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season Team_ID Composite Score\n",
"0 2018 1101 0.295815\n",
"1 2018 1102 0.328263\n",
"2 2018 1103 0.292680\n",
"3 2018 1104 0.780949\n",
"4 2018 1105 0.000265"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Run composite score functions\n",
"final_scores = get_composite_scores_df(2018)\n",
"final_scores.head()"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Pull team names and format data\n",
"final_teams = final_scores\n",
"pd.options.display.float_format = '{:.3f}'.format\n",
"\n",
"final_teams['Team Name'] = None\n",
"for index, rows in final_teams.iterrows():\n",
" final_teams['Team Name'][index] = get_team_name(final_teams['Team_ID'][index])\n",
"\n",
"final_teams = final_teams[['Season', 'Team Name', 'Composite Score']]\n",
"final_teams.sort_values(by='Composite Score', ascending = False, inplace = True)\n",
"\n",
"final_teams.reset_index(inplace = True, drop = True)\n",
"final_teams.index += 1"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Season</th>\n",
" <th>Team Name</th>\n",
" <th>Composite Score</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2018</td>\n",
" <td>Villanova</td>\n",
" <td>1.000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2018</td>\n",
" <td>Virginia</td>\n",
" <td>0.997</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2018</td>\n",
" <td>Cincinnati</td>\n",
" <td>0.976</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2018</td>\n",
" <td>Gonzaga</td>\n",
" <td>0.975</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2018</td>\n",
" <td>Duke</td>\n",
" <td>0.962</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2018</td>\n",
" <td>Purdue</td>\n",
" <td>0.960</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2018</td>\n",
" <td>Michigan St</td>\n",
" <td>0.953</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>2018</td>\n",
" <td>Michigan</td>\n",
" <td>0.947</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2018</td>\n",
" <td>North Carolina</td>\n",
" <td>0.930</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>2018</td>\n",
" <td>Kansas</td>\n",
" <td>0.926</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>2018</td>\n",
" <td>Xavier</td>\n",
" <td>0.924</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>2018</td>\n",
" <td>Houston</td>\n",
" <td>0.919</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>2018</td>\n",
" <td>Arizona</td>\n",
" <td>0.915</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>2018</td>\n",
" <td>Tennessee</td>\n",
" <td>0.913</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>2018</td>\n",
" <td>Wichita St</td>\n",
" <td>0.909</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>2018</td>\n",
" <td>Texas Tech</td>\n",
" <td>0.900</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>2018</td>\n",
" <td>West Virginia</td>\n",
" <td>0.897</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>2018</td>\n",
" <td>Kentucky</td>\n",
" <td>0.895</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>2018</td>\n",
" <td>Ohio St</td>\n",
" <td>0.888</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>2018</td>\n",
" <td>Auburn</td>\n",
" <td>0.883</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>2018</td>\n",
" <td>Nevada</td>\n",
" <td>0.881</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>2018</td>\n",
" <td>Clemson</td>\n",
" <td>0.871</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>2018</td>\n",
" <td>St Mary's CA</td>\n",
" <td>0.856</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>2018</td>\n",
" <td>TCU</td>\n",
" <td>0.856</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>2018</td>\n",
" <td>Florida</td>\n",
" <td>0.854</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Season Team Name Composite Score\n",
"1 2018 Villanova 1.000\n",
"2 2018 Virginia 0.997\n",
"3 2018 Cincinnati 0.976\n",
"4 2018 Gonzaga 0.975\n",
"5 2018 Duke 0.962\n",
"6 2018 Purdue 0.960\n",
"7 2018 Michigan St 0.953\n",
"8 2018 Michigan 0.947\n",
"9 2018 North Carolina 0.930\n",
"10 2018 Kansas 0.926\n",
"11 2018 Xavier 0.924\n",
"12 2018 Houston 0.919\n",
"13 2018 Arizona 0.915\n",
"14 2018 Tennessee 0.913\n",
"15 2018 Wichita St 0.909\n",
"16 2018 Texas Tech 0.900\n",
"17 2018 West Virginia 0.897\n",
"18 2018 Kentucky 0.895\n",
"19 2018 Ohio St 0.888\n",
"20 2018 Auburn 0.883\n",
"21 2018 Nevada 0.881\n",
"22 2018 Clemson 0.871\n",
"23 2018 St Mary's CA 0.856\n",
"24 2018 TCU 0.856\n",
"25 2018 Florida 0.854"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Look at rankings\n",
"final_teams.head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Submission I"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2018_1104_1112</td>\n",
" <td>0.500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2018_1104_1113</td>\n",
" <td>0.500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2018_1104_1116</td>\n",
" <td>0.500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2018_1104_1120</td>\n",
" <td>0.500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2018_1104_1137</td>\n",
" <td>0.500</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Pred\n",
"0 2018_1104_1112 0.500\n",
"1 2018_1104_1113 0.500\n",
"2 2018_1104_1116 0.500\n",
"3 2018_1104_1120 0.500\n",
"4 2018_1104_1137 0.500"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Sample data\n",
"data_dir = './March Madness 2018/'\n",
"sample = pd.read_csv(data_dir + 'SampleSubmissionStage2.csv')\n",
"sample.head()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2018_1104_1112</td>\n",
" <td>0.237</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2018_1104_1113</td>\n",
" <td>0.500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2018_1112_1113</td>\n",
" <td>0.763</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2018_1104_1116</td>\n",
" <td>0.400</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2018_1112_1116</td>\n",
" <td>0.682</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Pred\n",
"0 2018_1104_1112 0.237\n",
"1 2018_1104_1113 0.500\n",
"2 2018_1112_1113 0.763\n",
"3 2018_1104_1116 0.400\n",
"4 2018_1112_1116 0.682"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get probability dataframe\n",
"mod1 = pickle.load(open('ncaa_tourney1.pkl', 'rb'))\n",
"pred = generate_probs(mod1, 2018)\n",
"pred.head()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Write to csv\n",
"pred.to_csv('stage2_submission1.csv', index=None)"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Submission II"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style>\n",
" .dataframe thead tr:only-child th {\n",
" text-align: right;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: left;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ID</th>\n",
" <th>Pred</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2018_1104_1112</td>\n",
" <td>0.244</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2018_1104_1113</td>\n",
" <td>0.500</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2018_1112_1113</td>\n",
" <td>0.755</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2018_1104_1116</td>\n",
" <td>0.404</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2018_1112_1116</td>\n",
" <td>0.677</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ID Pred\n",
"0 2018_1104_1112 0.244\n",
"1 2018_1104_1113 0.500\n",
"2 2018_1112_1113 0.755\n",
"3 2018_1104_1116 0.404\n",
"4 2018_1112_1116 0.677"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get probability dataframe\n",
"mod2 = pickle.load(open('ncaa_tourney2.pkl', 'rb'))\n",
"pred = generate_probs(mod2, 2018)\n",
"pred.head()"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Write to csv\n",
"pred.to_csv('stage2_submission2.csv', index=None)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment