Skip to content

Instantly share code, notes, and snippets.

@isc-rsingh
Created April 20, 2018 13:15
Show Gist options
  • Save isc-rsingh/8dee6303e3c94f9593499ed0b8812b7a to your computer and use it in GitHub Desktop.
Save isc-rsingh/8dee6303e3c94f9593499ed0b8812b7a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"nbformat_minor": 1,
"cells": [
{
"source": "## Introduction to Machine Learning with PostgreSQL\n### Predicting NBA Winners",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "<a id=\"model\"></a>\n## 1. Load the data and create the model",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "### 1.1: Load data from PostgreSQL",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 31,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "labelCol = 'homeTeamWin'\ntraining_table = 'nba_training_data'"
},
{
"execution_count": 32,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "import pixiedust\nfrom pyspark.sql import SparkSession\nimport pandas as pd\nspark = SparkSession.builder.getOrCreate()\nsqlCtx = SQLContext(sc)"
},
{
"source": "**Tip:** All required fields can be found on Service Credentials tab of Compose Postgres service instance created in Bluemix. If you do not have credentials, you can create them by clicking \"New credentials\"",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 33,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "pg_host = 'sl-us-south-1-portal.3.dblayer.com:18447/compose'\npg_uname = \"xxxxxxxx\"\npg_pword = \"xxxxxxxx\""
},
{
"execution_count": 34,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "# The code was removed by DSX for sharing."
},
{
"execution_count": 35,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "from sqlalchemy import create_engine\nengine = create_engine(\"postgresql://\"+pg_uname+\":\"+pg_pword+\"@\"+pg_host)"
},
{
"execution_count": 36,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "df = pd.read_sql_query('select * from '+training_table,con=engine)\ndf = sqlCtx.createDataFrame(df)"
},
{
"execution_count": 37,
"cell_type": "code",
"metadata": {
"pixiedust": {
"displayParams": {
"tableFields": "awayTeam,awayTeamDaysOff,awayTeamLastFive,awayTeamLastTen,awayTeamTotalWinPercent,date,homeTeam,homeTeamDaysOff,homeTeamLastFive,homeTeamLastTen,homeTeamTotalWinPercent,homeTeamWin",
"filter": "{}",
"handlerId": "tableView",
"no_margin": "true",
"table_noschema": "true",
"table_nocount": "true",
"table_nosearch": "true"
}
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": "<style type=\"text/css\">.pd_warning{display:none;}</style><div class=\"pd_warning\"><em>Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter</em></div>\n <div class=\"pd_save is-viewer-good\" style=\"padding-right:10px;text-align: center;line-height:initial !important;font-size: xx-large;font-weight: 500;color: coral;\">\n \n </div>\n <div id=\"chartFigure98230c54\" class=\"pd_save is-viewer-good\" style=\"overflow-x:auto\">\n <style type=\"text/css\" class=\"pd_save\">\n .df-table-wrapper .panel-heading {\n border-radius: 0;\n padding: 0px;\n }\n .df-table-wrapper .panel-heading:hover {\n border-color: #008571;\n }\n .df-table-wrapper .panel-title a {\n background-color: #f9f9fb;\n color: #333333;\n display: block;\n outline: none;\n padding: 10px 15px;\n text-decoration: none;\n }\n .df-table-wrapper .panel-title a:hover {\n background-color: #337ab7;\n border-color: #2e6da4;\n color: #ffffff;\n display: block;\n padding: 10px 15px;\n text-decoration: none;\n }\n .df-table-wrapper {\n font-size: small;\n font-weight: 300;\n letter-spacing: 0.5px;\n line-height: normal;\n height: inherit;\n overflow: auto;\n }\n .df-table-search {\n margin: 0 0 20px 0;\n }\n .df-table-search-count {\n display: inline-block;\n margin: 0 0 20px 0;\n }\n .df-table-container {\n max-height: 50vh;\n max-width: 100%;\n overflow-x: auto;\n position: relative;\n }\n .df-table-wrapper table {\n border: 0 none #ffffff;\n border-collapse: collapse;\n margin: 0;\n min-width: 100%;\n padding: 0;\n table-layout: fixed;\n height: inherit;\n overflow: auto;\n }\n .df-table-wrapper tr.hidden {\n display: none;\n }\n .df-table-wrapper tr:nth-child(even) {\n background-color: #f9f9fb;\n }\n .df-table-wrapper tr.even {\n background-color: #f9f9fb;\n }\n .df-table-wrapper tr.odd {\n background-color: #ffffff;\n }\n .df-table-wrapper td + td {\n border-left: 1px solid #e0e0e0;\n }\n \n .df-table-wrapper thead,\n .fixed-header {\n font-weight: 600;\n }\n .df-table-wrapper tr,\n .fixed-row {\n border: 0 none #ffffff;\n margin: 0;\n padding: 0;\n }\n .df-table-wrapper th,\n .df-table-wrapper td,\n .fixed-cell {\n border: 0 none #ffffff;\n margin: 0;\n min-width: 50px;\n padding: 5px 20px 5px 10px;\n text-align: left;\n word-wrap: break-word;\n }\n .df-table-wrapper th {\n padding-bottom: 0;\n padding-top: 0;\n }\n .df-table-wrapper th div {\n max-height: 1px;\n visibility: hidden;\n }\n \n .df-schema-field {\n margin-left: 10px;\n }\n \n .fixed-header-container {\n overflow: hidden;\n position: relative;\n }\n .fixed-header {\n border-bottom: 2px solid #000;\n display: table;\n position: relative;\n }\n .fixed-row {\n display: table-row;\n }\n .fixed-cell {\n display: table-cell;\n }\n </style>\n \n \n <div class=\"df-table-wrapper df-table-wrapper-98230c54 panel-group pd_save\">\n <!-- dataframe schema -->\n \n <!-- dataframe table -->\n <div class=\"panel panel-default\" style=\"border:none;\">\n \n <div id=\"df-table-98230c54\" class=\"panel-collapse collapse in\">\n <div class=\"panel-body\">\n \n <div>\n \n </div>\n <!-- fixed header for when dataframe table scrolls -->\n <div class=\"fixed-header-container\">\n <div class=\"fixed-header\" style=\"width: 1681px;\">\n <div class=\"fixed-row\">\n \n <div class=\"fixed-cell\" style=\"width: 102px;\">date</div>\n \n <div class=\"fixed-cell\" style=\"width: 120px;\">homeTeamWin</div>\n \n <div class=\"fixed-cell\" style=\"width: 97px;\">homeTeam</div>\n \n <div class=\"fixed-cell\" style=\"width: 145px;\">homeTeamDaysOff</div>\n \n <div class=\"fixed-cell\" style=\"width: 198px;\">homeTeamTotalWinPercent</div>\n \n <div class=\"fixed-cell\" style=\"width: 149px;\">homeTeamLastFive</div>\n \n <div class=\"fixed-cell\" style=\"width: 145px;\">homeTeamLastTen</div>\n \n <div class=\"fixed-cell\" style=\"width: 95px;\">awayTeam</div>\n \n <div class=\"fixed-cell\" style=\"width: 143px;\">awayTeamDaysOff</div>\n \n <div class=\"fixed-cell\" style=\"width: 196px;\">awayTeamTotalWinPercent</div>\n \n <div class=\"fixed-cell\" style=\"width: 147px;\">awayTeamLastFive</div>\n \n <div class=\"fixed-cell\" style=\"width: 143px;\">awayTeamLastTen</div>\n \n </div>\n </div>\n </div>\n <div class=\"df-table-container\">\n <table class=\"df-table\">\n <thead>\n <tr>\n \n <th><div>date</div></th>\n \n <th><div>homeTeamWin</div></th>\n \n <th><div>homeTeam</div></th>\n \n <th><div>homeTeamDaysOff</div></th>\n \n <th><div>homeTeamTotalWinPercent</div></th>\n \n <th><div>homeTeamLastFive</div></th>\n \n <th><div>homeTeamLastTen</div></th>\n \n <th><div>awayTeam</div></th>\n \n <th><div>awayTeamDaysOff</div></th>\n \n <th><div>awayTeamTotalWinPercent</div></th>\n \n <th><div>awayTeamLastFive</div></th>\n \n <th><div>awayTeamLastTen</div></th>\n \n </tr>\n </thead>\n <tbody>\n \n <tr>\n \n <td>1484114400</td>\n \n <td>1</td>\n \n <td>MIN</td>\n \n <td>2</td>\n \n <td>315</td>\n \n <td>1</td>\n \n <td>3</td>\n \n <td>HOU</td>\n \n <td>1</td>\n \n <td>775</td>\n \n <td>5</td>\n \n <td>9</td>\n \n </tr>\n \n <tr>\n \n <td>1482300000</td>\n \n <td>0</td>\n \n <td>DET</td>\n \n <td>1</td>\n \n <td>466</td>\n \n <td>1</td>\n \n <td>4</td>\n \n <td>MEM</td>\n \n <td>0</td>\n \n <td>600</td>\n \n <td>1</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1484287200</td>\n \n <td>1</td>\n \n <td>MIN</td>\n \n <td>2</td>\n \n <td>333</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>OKC</td>\n \n <td>2</td>\n \n <td>600</td>\n \n <td>3</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1483423200</td>\n \n <td>0</td>\n \n <td>DET</td>\n \n <td>2</td>\n \n <td>444</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>IND</td>\n \n <td>2</td>\n \n <td>485</td>\n \n <td>2</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1490331600</td>\n \n <td>0</td>\n \n <td>CHI</td>\n \n <td>2</td>\n \n <td>472</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>PHI</td>\n \n <td>2</td>\n \n <td>366</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1479276000</td>\n \n <td>1</td>\n \n <td>ORL</td>\n \n <td>2</td>\n \n <td>363</td>\n \n <td>1</td>\n \n <td>4</td>\n \n <td>NO</td>\n \n <td>1</td>\n \n <td>181</td>\n \n <td>2</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1484719200</td>\n \n <td>1</td>\n \n <td>NO</td>\n \n <td>2</td>\n \n <td>380</td>\n \n <td>2</td>\n \n <td>5</td>\n \n <td>ORL</td>\n \n <td>2</td>\n \n <td>395</td>\n \n <td>1</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1479621600</td>\n \n <td>1</td>\n \n <td>DEN</td>\n \n <td>2</td>\n \n <td>333</td>\n \n <td>1</td>\n \n <td>3</td>\n \n <td>UTA</td>\n \n <td>1</td>\n \n <td>500</td>\n \n <td>2</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1486706400</td>\n \n <td>0</td>\n \n <td>BKN</td>\n \n <td>2</td>\n \n <td>169</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>MIA</td>\n \n <td>1</td>\n \n <td>433</td>\n \n <td>5</td>\n \n <td>10</td>\n \n </tr>\n \n <tr>\n \n <td>1486792800</td>\n \n <td>1</td>\n \n <td>HOU</td>\n \n <td>2</td>\n \n <td>696</td>\n \n <td>4</td>\n \n <td>6</td>\n \n <td>PHO</td>\n \n <td>0</td>\n \n <td>314</td>\n \n <td>2</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1478498400</td>\n \n <td>0</td>\n \n <td>PHI</td>\n \n <td>2</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>UTA</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n </tr>\n \n <tr>\n \n <td>1483855200</td>\n \n <td>0</td>\n \n <td>BKN</td>\n \n <td>2</td>\n \n <td>228</td>\n \n <td>0</td>\n \n <td>1</td>\n \n <td>PHI</td>\n \n <td>2</td>\n \n <td>264</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1490331600</td>\n \n <td>1</td>\n \n <td>WAS</td>\n \n <td>1</td>\n \n <td>605</td>\n \n <td>2</td>\n \n <td>6</td>\n \n <td>BKN</td>\n \n <td>0</td>\n \n <td>211</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1486188000</td>\n \n <td>0</td>\n \n <td>PHO</td>\n \n <td>0</td>\n \n <td>320</td>\n \n <td>1</td>\n \n <td>3</td>\n \n <td>MIL</td>\n \n <td>1</td>\n \n <td>428</td>\n \n <td>0</td>\n \n <td>1</td>\n \n </tr>\n \n <tr>\n \n <td>1489125600</td>\n \n <td>1</td>\n \n <td>DAL</td>\n \n <td>3</td>\n \n <td>428</td>\n \n <td>4</td>\n \n <td>6</td>\n \n <td>BKN</td>\n \n <td>2</td>\n \n <td>174</td>\n \n <td>2</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1486101600</td>\n \n <td>1</td>\n \n <td>ORL</td>\n \n <td>2</td>\n \n <td>372</td>\n \n <td>1</td>\n \n <td>2</td>\n \n <td>TOR</td>\n \n <td>1</td>\n \n <td>600</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1488088800</td>\n \n <td>1</td>\n \n <td>LAC</td>\n \n <td>1</td>\n \n <td>603</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>CHA</td>\n \n <td>1</td>\n \n <td>431</td>\n \n <td>1</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1483596000</td>\n \n <td>1</td>\n \n <td>IND</td>\n \n <td>1</td>\n \n <td>500</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>BKN</td>\n \n <td>2</td>\n \n <td>242</td>\n \n <td>1</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1491714000</td>\n \n <td>0</td>\n \n <td>DEN</td>\n \n <td>1</td>\n \n <td>481</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>OKC</td>\n \n <td>1</td>\n \n <td>569</td>\n \n <td>2</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1488434400</td>\n \n <td>1</td>\n \n <td>POR</td>\n \n <td>2</td>\n \n <td>406</td>\n \n <td>1</td>\n \n <td>3</td>\n \n <td>OKC</td>\n \n <td>2</td>\n \n <td>583</td>\n \n <td>4</td>\n \n <td>7</td>\n \n </tr>\n \n <tr>\n \n <td>1488780000</td>\n \n <td>1</td>\n \n <td>DET</td>\n \n <td>2</td>\n \n <td>483</td>\n \n <td>3</td>\n \n <td>6</td>\n \n <td>CHI</td>\n \n <td>1</td>\n \n <td>500</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1488952800</td>\n \n <td>0</td>\n \n <td>NO</td>\n \n <td>1</td>\n \n <td>390</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>TOR</td>\n \n <td>4</td>\n \n <td>587</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1487916000</td>\n \n <td>1</td>\n \n <td>CHI</td>\n \n <td>8</td>\n \n <td>491</td>\n \n <td>2</td>\n \n <td>5</td>\n \n <td>PHO</td>\n \n <td>8</td>\n \n <td>315</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1488952800</td>\n \n <td>1</td>\n \n <td>MIL</td>\n \n <td>2</td>\n \n <td>467</td>\n \n <td>3</td>\n \n <td>7</td>\n \n <td>NY</td>\n \n <td>2</td>\n \n <td>406</td>\n \n <td>2</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1482904800</td>\n \n <td>1</td>\n \n <td>ATL</td>\n \n <td>1</td>\n \n <td>483</td>\n \n <td>2</td>\n \n <td>5</td>\n \n <td>NY</td>\n \n <td>2</td>\n \n <td>533</td>\n \n <td>2</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1484892000</td>\n \n <td>1</td>\n \n <td>LAL</td>\n \n <td>3</td>\n \n <td>326</td>\n \n <td>0</td>\n \n <td>3</td>\n \n <td>IND</td>\n \n <td>2</td>\n \n <td>536</td>\n \n <td>4</td>\n \n <td>7</td>\n \n </tr>\n \n <tr>\n \n <td>1479276000</td>\n \n <td>1</td>\n \n <td>OKC</td>\n \n <td>2</td>\n \n <td>545</td>\n \n <td>1</td>\n \n <td>5</td>\n \n <td>HOU</td>\n \n <td>2</td>\n \n <td>600</td>\n \n <td>3</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1484978400</td>\n \n <td>1</td>\n \n <td>DET</td>\n \n <td>2</td>\n \n <td>454</td>\n \n <td>2</td>\n \n <td>5</td>\n \n <td>WAS</td>\n \n <td>1</td>\n \n <td>547</td>\n \n <td>4</td>\n \n <td>7</td>\n \n </tr>\n \n <tr>\n \n <td>1483855200</td>\n \n <td>0</td>\n \n <td>POR</td>\n \n <td>3</td>\n \n <td>421</td>\n \n <td>3</td>\n \n <td>3</td>\n \n <td>DET</td>\n \n <td>3</td>\n \n <td>447</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1491022800</td>\n \n <td>1</td>\n \n <td>BKN</td>\n \n <td>1</td>\n \n <td>213</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>ORL</td>\n \n <td>0</td>\n \n <td>355</td>\n \n <td>1</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1490331600</td>\n \n <td>1</td>\n \n <td>HOU</td>\n \n <td>4</td>\n \n <td>690</td>\n \n <td>4</td>\n \n <td>7</td>\n \n <td>NO</td>\n \n <td>3</td>\n \n <td>422</td>\n \n <td>4</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1486188000</td>\n \n <td>1</td>\n \n <td>UTA</td>\n \n <td>3</td>\n \n <td>620</td>\n \n <td>2</td>\n \n <td>7</td>\n \n <td>CHA</td>\n \n <td>2</td>\n \n <td>460</td>\n \n <td>0</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1478584800</td>\n \n <td>1</td>\n \n <td>MEM</td>\n \n <td>2</td>\n \n <td>428</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>DEN</td>\n \n <td>2</td>\n \n <td>500</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1480917600</td>\n \n <td>0</td>\n \n <td>LAL</td>\n \n <td>2</td>\n \n <td>454</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>UTA</td>\n \n <td>2</td>\n \n <td>571</td>\n \n <td>4</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1485842400</td>\n \n <td>1</td>\n \n <td>LAL</td>\n \n <td>5</td>\n \n <td>320</td>\n \n <td>1</td>\n \n <td>2</td>\n \n <td>DEN</td>\n \n <td>3</td>\n \n <td>456</td>\n \n <td>4</td>\n \n <td>7</td>\n \n </tr>\n \n <tr>\n \n <td>1478498400</td>\n \n <td>1</td>\n \n <td>LAC</td>\n \n <td>2</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>DET</td>\n \n <td>2</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n </tr>\n \n <tr>\n \n <td>1490072400</td>\n \n <td>0</td>\n \n <td>POR</td>\n \n <td>2</td>\n \n <td>463</td>\n \n <td>4</td>\n \n <td>8</td>\n \n <td>MIL</td>\n \n <td>2</td>\n \n <td>492</td>\n \n <td>3</td>\n \n <td>8</td>\n \n </tr>\n \n <tr>\n \n <td>1490677200</td>\n \n <td>0</td>\n \n <td>LAL</td>\n \n <td>2</td>\n \n <td>287</td>\n \n <td>1</td>\n \n <td>2</td>\n \n <td>WAS</td>\n \n <td>3</td>\n \n <td>616</td>\n \n <td>3</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1491627600</td>\n \n <td>1</td>\n \n <td>POR</td>\n \n <td>1</td>\n \n <td>493</td>\n \n <td>3</td>\n \n <td>7</td>\n \n <td>UTA</td>\n \n <td>1</td>\n \n <td>620</td>\n \n <td>4</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1486879200</td>\n \n <td>1</td>\n \n <td>SAC</td>\n \n <td>1</td>\n \n <td>407</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>NO</td>\n \n <td>2</td>\n \n <td>388</td>\n \n <td>2</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1479880800</td>\n \n <td>0</td>\n \n <td>IND</td>\n \n <td>2</td>\n \n <td>466</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>ATL</td>\n \n <td>0</td>\n \n <td>642</td>\n \n <td>2</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1484028000</td>\n \n <td>1</td>\n \n <td>GS</td>\n \n <td>2</td>\n \n <td>842</td>\n \n <td>4</td>\n \n <td>8</td>\n \n <td>MIA</td>\n \n <td>2</td>\n \n <td>282</td>\n \n <td>1</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1483336800</td>\n \n <td>1</td>\n \n <td>CHI</td>\n \n <td>2</td>\n \n <td>470</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>CHA</td>\n \n <td>2</td>\n \n <td>558</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1485669600</td>\n \n <td>0</td>\n \n <td>SA</td>\n \n <td>1</td>\n \n <td>782</td>\n \n <td>4</td>\n \n <td>7</td>\n \n <td>DAL</td>\n \n <td>2</td>\n \n <td>347</td>\n \n <td>2</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1486101600</td>\n \n <td>0</td>\n \n <td>POR</td>\n \n <td>3</td>\n \n <td>440</td>\n \n <td>4</td>\n \n <td>5</td>\n \n <td>DAL</td>\n \n <td>2</td>\n \n <td>387</td>\n \n <td>4</td>\n \n <td>7</td>\n \n </tr>\n \n <tr>\n \n <td>1484978400</td>\n \n <td>1</td>\n \n <td>ATL</td>\n \n <td>0</td>\n \n <td>581</td>\n \n <td>3</td>\n \n <td>8</td>\n \n <td>PHI</td>\n \n <td>1</td>\n \n <td>365</td>\n \n <td>4</td>\n \n <td>8</td>\n \n </tr>\n \n <tr>\n \n <td>1485324000</td>\n \n <td>1</td>\n \n <td>MEM</td>\n \n <td>4</td>\n \n <td>565</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>TOR</td>\n \n <td>1</td>\n \n <td>622</td>\n \n <td>1</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1488088800</td>\n \n <td>1</td>\n \n <td>TOR</td>\n \n <td>1</td>\n \n <td>586</td>\n \n <td>2</td>\n \n <td>5</td>\n \n <td>POR</td>\n \n <td>2</td>\n \n <td>421</td>\n \n <td>2</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1487138400</td>\n \n <td>1</td>\n \n <td>OKC</td>\n \n <td>2</td>\n \n <td>553</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>NY</td>\n \n <td>3</td>\n \n <td>410</td>\n \n <td>1</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1490072400</td>\n \n <td>1</td>\n \n <td>NO</td>\n \n <td>2</td>\n \n <td>414</td>\n \n <td>4</td>\n \n <td>6</td>\n \n <td>MEM</td>\n \n <td>2</td>\n \n <td>571</td>\n \n <td>4</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1479708000</td>\n \n <td>0</td>\n \n <td>MIN</td>\n \n <td>2</td>\n \n <td>333</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>BOS</td>\n \n <td>2</td>\n \n <td>538</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1486360800</td>\n \n <td>0</td>\n \n <td>NY</td>\n \n <td>1</td>\n \n <td>423</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>LAL</td>\n \n <td>2</td>\n \n <td>320</td>\n \n <td>1</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1479016800</td>\n \n <td>1</td>\n \n <td>GS</td>\n \n <td>2</td>\n \n <td>777</td>\n \n <td>4</td>\n \n <td>7</td>\n \n <td>PHO</td>\n \n <td>0</td>\n \n <td>300</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1480658400</td>\n \n <td>1</td>\n \n <td>NY</td>\n \n <td>1</td>\n \n <td>500</td>\n \n <td>3</td>\n \n <td>6</td>\n \n <td>MIN</td>\n \n <td>1</td>\n \n <td>277</td>\n \n <td>1</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1483336800</td>\n \n <td>0</td>\n \n <td>BKN</td>\n \n <td>3</td>\n \n <td>250</td>\n \n <td>1</td>\n \n <td>2</td>\n \n <td>UTA</td>\n \n <td>1</td>\n \n <td>617</td>\n \n <td>3</td>\n \n <td>7</td>\n \n </tr>\n \n <tr>\n \n <td>1484719200</td>\n \n <td>1</td>\n \n <td>WAS</td>\n \n <td>2</td>\n \n <td>525</td>\n \n <td>4</td>\n \n <td>7</td>\n \n <td>MEM</td>\n \n <td>2</td>\n \n <td>581</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1480053600</td>\n \n <td>1</td>\n \n <td>CLE</td>\n \n <td>2</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>DAL</td>\n \n <td>1</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n </tr>\n \n <tr>\n \n <td>1483250400</td>\n \n <td>1</td>\n \n <td>ATL</td>\n \n <td>1</td>\n \n <td>515</td>\n \n <td>3</td>\n \n <td>6</td>\n \n <td>SA</td>\n \n <td>1</td>\n \n <td>818</td>\n \n <td>4</td>\n \n <td>9</td>\n \n </tr>\n \n <tr>\n \n <td>1488261600</td>\n \n <td>1</td>\n \n <td>OKC</td>\n \n <td>2</td>\n \n <td>576</td>\n \n <td>3</td>\n \n <td>6</td>\n \n <td>UTA</td>\n \n <td>2</td>\n \n <td>627</td>\n \n <td>3</td>\n \n <td>7</td>\n \n </tr>\n \n <tr>\n \n <td>1489381200</td>\n \n <td>1</td>\n \n <td>MEM</td>\n \n <td>1</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>MIL</td>\n \n <td>2</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n </tr>\n \n <tr>\n \n <td>1484200800</td>\n \n <td>0</td>\n \n <td>PHO</td>\n \n <td>4</td>\n \n <td>315</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>DAL</td>\n \n <td>3</td>\n \n <td>289</td>\n \n <td>1</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1478844000</td>\n \n <td>1</td>\n \n <td>POR</td>\n \n <td>1</td>\n \n <td>555</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>SAC</td>\n \n <td>0</td>\n \n <td>400</td>\n \n <td>2</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1480053600</td>\n \n <td>1</td>\n \n <td>POR</td>\n \n <td>2</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>NO</td>\n \n <td>2</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n </tr>\n \n <tr>\n \n <td>1479880800</td>\n \n <td>1</td>\n \n <td>DET</td>\n \n <td>2</td>\n \n <td>400</td>\n \n <td>1</td>\n \n <td>3</td>\n \n <td>MIA</td>\n \n <td>2</td>\n \n <td>307</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1484460000</td>\n \n <td>1</td>\n \n <td>TOR</td>\n \n <td>1</td>\n \n <td>666</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>NY</td>\n \n <td>2</td>\n \n <td>450</td>\n \n <td>2</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1480140000</td>\n \n <td>1</td>\n \n <td>GS</td>\n \n <td>1</td>\n \n <td>875</td>\n \n <td>5</td>\n \n <td>10</td>\n \n <td>MIN</td>\n \n <td>1</td>\n \n <td>333</td>\n \n <td>2</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1477890000</td>\n \n <td>0</td>\n \n <td>BKN</td>\n \n <td>1</td>\n \n <td>333</td>\n \n <td>1</td>\n \n <td>1</td>\n \n <td>CHI</td>\n \n <td>1</td>\n \n <td>1000</td>\n \n <td>2</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1478671200</td>\n \n <td>0</td>\n \n <td>ORL</td>\n \n <td>1</td>\n \n <td>428</td>\n \n <td>3</td>\n \n <td>3</td>\n \n <td>MIN</td>\n \n <td>0</td>\n \n <td>166</td>\n \n <td>1</td>\n \n <td>1</td>\n \n </tr>\n \n <tr>\n \n <td>1484978400</td>\n \n <td>1</td>\n \n <td>CHI</td>\n \n <td>1</td>\n \n <td>477</td>\n \n <td>2</td>\n \n <td>5</td>\n \n <td>SAC</td>\n \n <td>1</td>\n \n <td>380</td>\n \n <td>1</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1490331600</td>\n \n <td>1</td>\n \n <td>LAL</td>\n \n <td>3</td>\n \n <td>281</td>\n \n <td>0</td>\n \n <td>1</td>\n \n <td>MIN</td>\n \n <td>3</td>\n \n <td>400</td>\n \n <td>1</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1479103200</td>\n \n <td>1</td>\n \n <td>IND</td>\n \n <td>2</td>\n \n <td>400</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>ORL</td>\n \n <td>1</td>\n \n <td>400</td>\n \n <td>2</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1483682400</td>\n \n <td>0</td>\n \n <td>ORL</td>\n \n <td>2</td>\n \n <td>432</td>\n \n <td>2</td>\n \n <td>5</td>\n \n <td>HOU</td>\n \n <td>0</td>\n \n <td>756</td>\n \n <td>5</td>\n \n <td>8</td>\n \n </tr>\n \n <tr>\n \n <td>1477371600</td>\n \n <td>0</td>\n \n <td>GS</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>SA</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n </tr>\n \n <tr>\n \n <td>1486360800</td>\n \n <td>1</td>\n \n <td>TOR</td>\n \n <td>0</td>\n \n <td>596</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>LAC</td>\n \n <td>1</td>\n \n <td>607</td>\n \n <td>1</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1481868000</td>\n \n <td>1</td>\n \n <td>WAS</td>\n \n <td>2</td>\n \n <td>416</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>DET</td>\n \n <td>1</td>\n \n <td>518</td>\n \n <td>3</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1485324000</td>\n \n <td>1</td>\n \n <td>POR</td>\n \n <td>4</td>\n \n <td>413</td>\n \n <td>1</td>\n \n <td>4</td>\n \n <td>LAL</td>\n \n <td>3</td>\n \n <td>333</td>\n \n <td>1</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1491714000</td>\n \n <td>0</td>\n \n <td>NY</td>\n \n <td>1</td>\n \n <td>375</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>TOR</td>\n \n <td>1</td>\n \n <td>612</td>\n \n <td>4</td>\n \n <td>8</td>\n \n </tr>\n \n <tr>\n \n <td>1482732000</td>\n \n <td>1</td>\n \n <td>HOU</td>\n \n <td>3</td>\n \n <td>709</td>\n \n <td>3</td>\n \n <td>8</td>\n \n <td>PHO</td>\n \n <td>2</td>\n \n <td>300</td>\n \n <td>1</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1483250400</td>\n \n <td>0</td>\n \n <td>MIA</td>\n \n <td>1</td>\n \n <td>294</td>\n \n <td>1</td>\n \n <td>3</td>\n \n <td>DET</td>\n \n <td>1</td>\n \n <td>428</td>\n \n <td>1</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1485928800</td>\n \n <td>0</td>\n \n <td>OKC</td>\n \n <td>1</td>\n \n <td>571</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>CHI</td>\n \n <td>3</td>\n \n <td>489</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1491973200</td>\n \n <td>0</td>\n \n <td>OKC</td>\n \n <td>1</td>\n \n <td>580</td>\n \n <td>4</td>\n \n <td>6</td>\n \n <td>DEN</td>\n \n <td>0</td>\n \n <td>481</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1481868000</td>\n \n <td>0</td>\n \n <td>CHI</td>\n \n <td>1</td>\n \n <td>520</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>MIL</td>\n \n <td>1</td>\n \n <td>500</td>\n \n <td>2</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1478322000</td>\n \n <td>0</td>\n \n <td>SA</td>\n \n <td>0</td>\n \n <td>833</td>\n \n <td>4</td>\n \n <td>5</td>\n \n <td>LAC</td>\n \n <td>1</td>\n \n <td>800</td>\n \n <td>4</td>\n \n <td>4</td>\n \n </tr>\n \n <tr>\n \n <td>1490331600</td>\n \n <td>1</td>\n \n <td>BOS</td>\n \n <td>2</td>\n \n <td>638</td>\n \n <td>4</td>\n \n <td>6</td>\n \n <td>PHO</td>\n \n <td>1</td>\n \n <td>305</td>\n \n <td>0</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1478844000</td>\n \n <td>1</td>\n \n <td>BOS</td>\n \n <td>2</td>\n \n <td>428</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>NY</td>\n \n <td>2</td>\n \n <td>428</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1482472800</td>\n \n <td>1</td>\n \n <td>NO</td>\n \n <td>2</td>\n \n <td>322</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>MIA</td>\n \n <td>1</td>\n \n <td>333</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1483423200</td>\n \n <td>1</td>\n \n <td>PHO</td>\n \n <td>0</td>\n \n <td>285</td>\n \n <td>1</td>\n \n <td>2</td>\n \n <td>MIA</td>\n \n <td>2</td>\n \n <td>285</td>\n \n <td>0</td>\n \n <td>2</td>\n \n </tr>\n \n <tr>\n \n <td>1481522400</td>\n \n <td>1</td>\n \n <td>DAL</td>\n \n <td>2</td>\n \n <td>217</td>\n \n <td>2</td>\n \n <td>3</td>\n \n <td>DEN</td>\n \n <td>2</td>\n \n <td>375</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1481090400</td>\n \n <td>0</td>\n \n <td>LAC</td>\n \n <td>3</td>\n \n <td>727</td>\n \n <td>2</td>\n \n <td>6</td>\n \n <td>GS</td>\n \n <td>2</td>\n \n <td>857</td>\n \n <td>4</td>\n \n <td>9</td>\n \n </tr>\n \n <tr>\n \n <td>1491022800</td>\n \n <td>1</td>\n \n <td>CHI</td>\n \n <td>1</td>\n \n <td>480</td>\n \n <td>3</td>\n \n <td>5</td>\n \n <td>ATL</td>\n \n <td>2</td>\n \n <td>520</td>\n \n <td>2</td>\n \n <td>3</td>\n \n </tr>\n \n <tr>\n \n <td>1485583200</td>\n \n <td>0</td>\n \n <td>UTA</td>\n \n <td>1</td>\n \n <td>625</td>\n \n <td>3</td>\n \n <td>7</td>\n \n <td>MEM</td>\n \n <td>0</td>\n \n <td>562</td>\n \n <td>2</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1479362400</td>\n \n <td>1</td>\n \n <td>HOU</td>\n \n <td>1</td>\n \n <td>545</td>\n \n <td>3</td>\n \n <td>6</td>\n \n <td>POR</td>\n \n <td>1</td>\n \n <td>583</td>\n \n <td>3</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1480053600</td>\n \n <td>0</td>\n \n <td>ORL</td>\n \n <td>2</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>WAS</td>\n \n <td>4</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n </tr>\n \n <tr>\n \n <td>1484460000</td>\n \n <td>0</td>\n \n <td>SAC</td>\n \n <td>1</td>\n \n <td>410</td>\n \n <td>1</td>\n \n <td>4</td>\n \n <td>OKC</td>\n \n <td>2</td>\n \n <td>585</td>\n \n <td>3</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1485928800</td>\n \n <td>0</td>\n \n <td>PHO</td>\n \n <td>4</td>\n \n <td>312</td>\n \n <td>1</td>\n \n <td>3</td>\n \n <td>LAC</td>\n \n <td>4</td>\n \n <td>625</td>\n \n <td>1</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1479362400</td>\n \n <td>1</td>\n \n <td>MIA</td>\n \n <td>2</td>\n \n <td>200</td>\n \n <td>0</td>\n \n <td>2</td>\n \n <td>MIL</td>\n \n <td>1</td>\n \n <td>500</td>\n \n <td>2</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1488607200</td>\n \n <td>1</td>\n \n <td>SA</td>\n \n <td>0</td>\n \n <td>783</td>\n \n <td>5</td>\n \n <td>8</td>\n \n <td>MIN</td>\n \n <td>3</td>\n \n <td>409</td>\n \n <td>4</td>\n \n <td>6</td>\n \n </tr>\n \n <tr>\n \n <td>1487829600</td>\n \n <td>1</td>\n \n <td>GS</td>\n \n <td>8</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>LAC</td>\n \n <td>8</td>\n \n <td>0</td>\n \n <td>0</td>\n \n <td>0</td>\n \n </tr>\n \n <tr>\n \n <td>1489298400</td>\n \n <td>1</td>\n \n <td>HOU</td>\n \n <td>2</td>\n \n <td>681</td>\n \n <td>3</td>\n \n <td>6</td>\n \n <td>CLE</td>\n \n <td>3</td>\n \n <td>671</td>\n \n <td>2</td>\n \n <td>5</td>\n \n </tr>\n \n <tr>\n \n <td>1489899600</td>\n \n <td>0</td>\n \n <td>BKN</td>\n \n <td>2</td>\n \n <td>191</td>\n \n <td>2</td>\n \n <td>4</td>\n \n <td>DAL</td>\n \n <td>2</td>\n \n <td>426</td>\n \n <td>2</td>\n \n <td>6</td>\n \n </tr>\n \n </tbody>\n </table>\n </div>\n </div>\n </div>\n </div>\n </div>\n \n <script class=\"pd_save\">\n $(function() {\n var tableWrapper = $('.df-table-wrapper-98230c54');\n var fixedHeader = $('.fixed-header', tableWrapper);\n var tableContainer = $('.df-table-container', tableWrapper);\n var table = $('.df-table', tableContainer);\n var rows = $('tbody > tr', table);\n var total = 100;\n \n fixedHeader\n .css('width', table.width())\n .find('.fixed-cell')\n .each(function(i, e) {\n $(this).css('width', $('.df-table-wrapper-98230c54 th:nth-child(' + (i+1) + ')').css('width'));\n });\n \n tableContainer.scroll(function() {\n fixedHeader.css({ left: table.position().left });\n });\n \n rows.on(\"click\", function(e){\n var txt = e.delegateTarget.innerText;\n var splits = txt.split(\"\\t\");\n var len = splits.length;\n var hdrs = $(fixedHeader).find(\".fixed-cell\");\n // Add all cells in the selected row as a map to be consumed by the target as needed\n var payload = {type:\"select\", targetDivId: \"\" };\n for (var i = 0; i < len; i++) {\n payload[hdrs[i].innerHTML] = splits[i];\n }\n \n //simple selection highlighting, client adds \"selected\" class\n $(this).addClass(\"selected\").siblings().removeClass(\"selected\");\n $(document).trigger('pd_event', payload);\n });\n \n $('.df-table-search', tableWrapper).keyup(function() {\n var val = '^(?=.*\\\\b' + $.trim($(this).val()).split(/\\s+/).join('\\\\b)(?=.*\\\\b') + ').*$';\n var reg = RegExp(val, 'i');\n var index = 0;\n \n rows.each(function(i, e) {\n if (!reg.test($(this).text().replace(/\\s+/g, ' '))) {\n $(this).attr('class', 'hidden');\n }\n else {\n $(this).attr('class', (++index % 2 == 0 ? 'even' : 'odd'));\n }\n });\n $('.df-table-search-count', tableWrapper).html('Showing ' + index + ' of ' + total + ' rows');\n });\n });\n \n $(\".df-table-wrapper td:contains('http://')\").each(function(){var tc = this.textContent; $(this).wrapInner(\"<a target='_blank' href='\" + tc + \"'></a>\");});\n $(\".df-table-wrapper td:contains('https://')\").each(function(){var tc = this.textContent; $(this).wrapInner(\"<a target='_blank' href='\" + tc + \"'></a>\");});\n </script>\n \n </div>",
"text/plain": "<IPython.core.display.HTML object>"
},
"metadata": {}
}
],
"source": "display(df)"
},
{
"source": "### 1.2: Prepare data",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "In this subsection you will split your data into: train and test datasets.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 38,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "Number of records for training: 988\nNumber of records for evaluation: 242\n"
}
],
"source": "(train_data, test_data) = df.randomSplit([0.8, 0.2], 24)\nprint(\"Number of records for training: \" + str(train_data.count()))\nprint(\"Number of records for evaluation: \" + str(test_data.count()))"
},
{
"source": "As you can see our data has been successfully split into two datasets:\n - The train data set, which is the largest group, is used for training.\n - The test data set will be used for model evaluation.",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "### 2.3: Create pipeline and train a model",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "In this section you will create an Apache\u00ae Spark machine learning pipeline and then train the model.",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "In the first step you need to import the Apache\u00ae Spark machine learning packages that will be needed in the subsequent steps.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 39,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "from pyspark.ml import Pipeline, Model\nfrom pyspark.ml.feature import VectorAssembler\nfrom pyspark.ml.classification import LogisticRegression, RandomForestClassifier\nfrom pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator\n"
},
{
"source": "In the following step, create a feature vector by combining all features together.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 40,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "vectorAssembler_features = VectorAssembler(\n inputCols=[\n 'homeTeamHomeWinPercent',\n 'homeTeamLastFive',\n 'awayTeamAwayWinPercent',\n 'awayTeamLastFive'\n ], outputCol='features'\n)"
},
{
"source": "Next, define estimators you want to use for classification. Logistic Regression is used in the following example.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 41,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "lr = LogisticRegression(labelCol=labelCol, featuresCol='features', maxIter=10, regParam=0.01, family=\"binomial\")"
},
{
"source": "Let's build the pipeline now. A pipeline consists of transformers and an estimator.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 42,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "pipeline_lr = Pipeline(stages=[vectorAssembler_features,lr])"
},
{
"source": "Now, you can train your Logistic Regression model by using the previously defined pipeline and train data.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 43,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "model = pipeline_lr.fit(train_data)"
},
{
"source": "You can check your model accuracy now. To evaluate the model, use test data.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 44,
"cell_type": "code",
"metadata": {
"pixiedust": {
"displayParams": {
"tableFields": "awayTeam,awayTeamLastFive,homeTeam,homeTeamLastFive,homeTeamWin,prediction",
"handlerId": "tableView"
}
}
},
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": "<style type=\"text/css\">.pd_warning{display:none;}</style><div class=\"pd_warning\"><em>Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter</em></div>\n <div class=\"pd_save is-viewer-good\" style=\"padding-right:10px;text-align: center;line-height:initial !important;font-size: xx-large;font-weight: 500;color: coral;\">\n \n </div>\n <div id=\"chartFigure2d655c46\" class=\"pd_save is-viewer-good\" style=\"overflow-x:auto\">\n <style type=\"text/css\" class=\"pd_save\">\n .df-table-wrapper .panel-heading {\n border-radius: 0;\n padding: 0px;\n }\n .df-table-wrapper .panel-heading:hover {\n border-color: #008571;\n }\n .df-table-wrapper .panel-title a {\n background-color: #f9f9fb;\n color: #333333;\n display: block;\n outline: none;\n padding: 10px 15px;\n text-decoration: none;\n }\n .df-table-wrapper .panel-title a:hover {\n background-color: #337ab7;\n border-color: #2e6da4;\n color: #ffffff;\n display: block;\n padding: 10px 15px;\n text-decoration: none;\n }\n .df-table-wrapper {\n font-size: small;\n font-weight: 300;\n letter-spacing: 0.5px;\n line-height: normal;\n height: inherit;\n overflow: auto;\n }\n .df-table-search {\n margin: 0 0 20px 0;\n }\n .df-table-search-count {\n display: inline-block;\n margin: 0 0 20px 0;\n }\n .df-table-container {\n max-height: 50vh;\n max-width: 100%;\n overflow-x: auto;\n position: relative;\n }\n .df-table-wrapper table {\n border: 0 none #ffffff;\n border-collapse: collapse;\n margin: 0;\n min-width: 100%;\n padding: 0;\n table-layout: fixed;\n height: inherit;\n overflow: auto;\n }\n .df-table-wrapper tr.hidden {\n display: none;\n }\n .df-table-wrapper tr:nth-child(even) {\n background-color: #f9f9fb;\n }\n .df-table-wrapper tr.even {\n background-color: #f9f9fb;\n }\n .df-table-wrapper tr.odd {\n background-color: #ffffff;\n }\n .df-table-wrapper td + td {\n border-left: 1px solid #e0e0e0;\n }\n \n .df-table-wrapper thead,\n .fixed-header {\n font-weight: 600;\n }\n .df-table-wrapper tr,\n .fixed-row {\n border: 0 none #ffffff;\n margin: 0;\n padding: 0;\n }\n .df-table-wrapper th,\n .df-table-wrapper td,\n .fixed-cell {\n border: 0 none #ffffff;\n margin: 0;\n min-width: 50px;\n padding: 5px 20px 5px 10px;\n text-align: left;\n word-wrap: break-word;\n }\n .df-table-wrapper th {\n padding-bottom: 0;\n padding-top: 0;\n }\n .df-table-wrapper th div {\n max-height: 1px;\n visibility: hidden;\n }\n \n .df-schema-field {\n margin-left: 10px;\n }\n \n .fixed-header-container {\n overflow: hidden;\n position: relative;\n }\n .fixed-header {\n border-bottom: 2px solid #000;\n display: table;\n position: relative;\n }\n .fixed-row {\n display: table-row;\n }\n .fixed-cell {\n display: table-cell;\n }\n </style>\n \n \n <div class=\"df-table-wrapper df-table-wrapper-2d655c46 panel-group pd_save\">\n <!-- dataframe schema -->\n \n <div class=\"panel panel-default\">\n <div class=\"panel-heading\">\n <h4 class=\"panel-title\" style=\"margin: 0px;\">\n <a data-toggle=\"collapse\" href=\"#df-schema-2d655c46\" data-parent=\"#df-table-wrapper-2d655c46\">Schema</a>\n </h4>\n </div>\n <div id=\"df-schema-2d655c46\" class=\"panel-collapse collapse\">\n <div class=\"panel-body\" style=\"font-family: monospace;\">\n <div class=\"df-schema-fields\">\n <div>Field types:</div>\n \n <div class=\"df-schema-field\"><strong>homeTeamWin: </strong> float64</div>\n \n <div class=\"df-schema-field\"><strong>homeTeam: </strong> object</div>\n \n <div class=\"df-schema-field\"><strong>homeTeamLastFive: </strong> int64</div>\n \n <div class=\"df-schema-field\"><strong>awayTeam: </strong> object</div>\n \n <div class=\"df-schema-field\"><strong>awayTeamLastFive: </strong> int64</div>\n \n <div class=\"df-schema-field\"><strong>prediction: </strong> float64</div>\n \n </div>\n </div>\n </div>\n </div>\n \n <!-- dataframe table -->\n <div class=\"panel panel-default\">\n \n <div class=\"panel-heading\">\n <h4 class=\"panel-title\" style=\"margin: 0px;\">\n <a data-toggle=\"collapse\" href=\"#df-table-2d655c46\" data-parent=\"#df-table-wrapper-2d655c46\"> Table</a>\n </h4>\n </div>\n \n <div id=\"df-table-2d655c46\" class=\"panel-collapse collapse in\">\n <div class=\"panel-body\">\n \n <input type=\"text\" class=\"df-table-search form-control input-sm\" placeholder=\"Search table\">\n \n <div>\n \n <span class=\"df-table-search-count\">Showing 100 of 988 rows</span>\n \n </div>\n <!-- fixed header for when dataframe table scrolls -->\n <div class=\"fixed-header-container\">\n <div class=\"fixed-header\" style=\"width: 873px;\">\n <div class=\"fixed-row\">\n \n <div class=\"fixed-cell\" style=\"width: 149px;\">homeTeamWin</div>\n \n <div class=\"fixed-cell\" style=\"width: 121px;\">homeTeam</div>\n \n <div class=\"fixed-cell\" style=\"width: 186px;\">homeTeamLastFive</div>\n \n <div class=\"fixed-cell\" style=\"width: 118px;\">awayTeam</div>\n \n <div class=\"fixed-cell\" style=\"width: 183px;\">awayTeamLastFive</div>\n \n <div class=\"fixed-cell\" style=\"width: 117px;\">prediction</div>\n \n </div>\n </div>\n </div>\n <div class=\"df-table-container\">\n <table class=\"df-table\">\n <thead>\n <tr>\n \n <th><div>homeTeamWin</div></th>\n \n <th><div>homeTeam</div></th>\n \n <th><div>homeTeamLastFive</div></th>\n \n <th><div>awayTeam</div></th>\n \n <th><div>awayTeamLastFive</div></th>\n \n <th><div>prediction</div></th>\n \n </tr>\n </thead>\n <tbody>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MEM</td>\n \n <td>3</td>\n \n <td>PHI</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MEM</td>\n \n <td>1</td>\n \n <td>DAL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>LAC</td>\n \n <td>5</td>\n \n <td>SAC</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>PHO</td>\n \n <td>1</td>\n \n <td>TOR</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>BOS</td>\n \n <td>2</td>\n \n <td>CHI</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>PHI</td>\n \n <td>0</td>\n \n <td>IND</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>DEN</td>\n \n <td>3</td>\n \n <td>MIN</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MIL</td>\n \n <td>2</td>\n \n <td>CHA</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>NO</td>\n \n <td>3</td>\n \n <td>CHI</td>\n \n <td>4</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MIA</td>\n \n <td>2</td>\n \n <td>NY</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MIN</td>\n \n <td>1</td>\n \n <td>DET</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>LAL</td>\n \n <td>2</td>\n \n <td>SAC</td>\n \n <td>4</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>CHI</td>\n \n <td>3</td>\n \n <td>ATL</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>SAC</td>\n \n <td>2</td>\n \n <td>CLE</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>DAL</td>\n \n <td>2</td>\n \n <td>PHO</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>BKN</td>\n \n <td>1</td>\n \n <td>WAS</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>TOR</td>\n \n <td>5</td>\n \n <td>CLE</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>NY</td>\n \n <td>1</td>\n \n <td>CHI</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>NY</td>\n \n <td>0</td>\n \n <td>BOS</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MIL</td>\n \n <td>3</td>\n \n <td>WAS</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>UTA</td>\n \n <td>2</td>\n \n <td>MIL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>DEN</td>\n \n <td>2</td>\n \n <td>DAL</td>\n \n <td>4</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>PHO</td>\n \n <td>2</td>\n \n <td>SAC</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>PHO</td>\n \n <td>1</td>\n \n <td>DAL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>CLE</td>\n \n <td>4</td>\n \n <td>ATL</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MIA</td>\n \n <td>3</td>\n \n <td>NY</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>BOS</td>\n \n <td>5</td>\n \n <td>LAL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>CLE</td>\n \n <td>2</td>\n \n <td>UTA</td>\n \n <td>4</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>ATL</td>\n \n <td>3</td>\n \n <td>CHA</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>POR</td>\n \n <td>1</td>\n \n <td>OKC</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>BKN</td>\n \n <td>1</td>\n \n <td>CLE</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>HOU</td>\n \n <td>2</td>\n \n <td>ATL</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>TOR</td>\n \n <td>2</td>\n \n <td>IND</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>CLE</td>\n \n <td>4</td>\n \n <td>CHI</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>UTA</td>\n \n <td>2</td>\n \n <td>DAL</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>GS</td>\n \n <td>4</td>\n \n <td>CHA</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>POR</td>\n \n <td>4</td>\n \n <td>MIN</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>BKN</td>\n \n <td>2</td>\n \n <td>ORL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>LAL</td>\n \n <td>1</td>\n \n <td>DEN</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MEM</td>\n \n <td>3</td>\n \n <td>SA</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>POR</td>\n \n <td>3</td>\n \n <td>GS</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>PHO</td>\n \n <td>2</td>\n \n <td>DEN</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>DAL</td>\n \n <td>3</td>\n \n <td>ORL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>POR</td>\n \n <td>2</td>\n \n <td>ATL</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>BKN</td>\n \n <td>1</td>\n \n <td>LAL</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>DEN</td>\n \n <td>4</td>\n \n <td>LAC</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>LAC</td>\n \n <td>2</td>\n \n <td>NY</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>NO</td>\n \n <td>0</td>\n \n <td>DEN</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>ATL</td>\n \n <td>3</td>\n \n <td>UTA</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MIA</td>\n \n <td>1</td>\n \n <td>HOU</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>BOS</td>\n \n <td>4</td>\n \n <td>MIL</td>\n \n <td>4</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>ORL</td>\n \n <td>2</td>\n \n <td>MIA</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>TOR</td>\n \n <td>4</td>\n \n <td>PHI</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>NY</td>\n \n <td>3</td>\n \n <td>OKC</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>ATL</td>\n \n <td>4</td>\n \n <td>LAC</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>BOS</td>\n \n <td>4</td>\n \n <td>PHI</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>SA</td>\n \n <td>4</td>\n \n <td>GS</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>NO</td>\n \n <td>4</td>\n \n <td>MEM</td>\n \n <td>4</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>IND</td>\n \n <td>2</td>\n \n <td>BOS</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>ATL</td>\n \n <td>3</td>\n \n <td>CHI</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>CLE</td>\n \n <td>2</td>\n \n <td>SAC</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MEM</td>\n \n <td>1</td>\n \n <td>WAS</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>ATL</td>\n \n <td>3</td>\n \n <td>PHI</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>NY</td>\n \n <td>3</td>\n \n <td>MIN</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MIN</td>\n \n <td>2</td>\n \n <td>MIL</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>NO</td>\n \n <td>0</td>\n \n <td>GS</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>BOS</td>\n \n <td>3</td>\n \n <td>GS</td>\n \n <td>5</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MIL</td>\n \n <td>2</td>\n \n <td>CLE</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MIN</td>\n \n <td>4</td>\n \n <td>ORL</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>WAS</td>\n \n <td>3</td>\n \n <td>TOR</td>\n \n <td>4</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MEM</td>\n \n <td>2</td>\n \n <td>HOU</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MEM</td>\n \n <td>0</td>\n \n <td>ORL</td>\n \n <td>0</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MEM</td>\n \n <td>4</td>\n \n <td>CLE</td>\n \n <td>5</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>NY</td>\n \n <td>0</td>\n \n <td>MIL</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>MIL</td>\n \n <td>3</td>\n \n <td>MIA</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>PHO</td>\n \n <td>2</td>\n \n <td>DAL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>OKC</td>\n \n <td>2</td>\n \n <td>CLE</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>BKN</td>\n \n <td>1</td>\n \n <td>BOS</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>NY</td>\n \n <td>1</td>\n \n <td>NO</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>TOR</td>\n \n <td>1</td>\n \n <td>ORL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>SA</td>\n \n <td>4</td>\n \n <td>IND</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>BKN</td>\n \n <td>0</td>\n \n <td>MEM</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MIA</td>\n \n <td>2</td>\n \n <td>DEN</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>LAL</td>\n \n <td>0</td>\n \n <td>MIN</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>CLE</td>\n \n <td>4</td>\n \n <td>LAL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>DAL</td>\n \n <td>3</td>\n \n <td>HOU</td>\n \n <td>3</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>ATL</td>\n \n <td>5</td>\n \n <td>BOS</td>\n \n <td>4</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>DEN</td>\n \n <td>2</td>\n \n <td>POR</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MIL</td>\n \n <td>4</td>\n \n <td>DAL</td>\n \n <td>1</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>POR</td>\n \n <td>4</td>\n \n <td>CHI</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>UTA</td>\n \n <td>4</td>\n \n <td>OKC</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MEM</td>\n \n <td>2</td>\n \n <td>BOS</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>CHA</td>\n \n <td>1</td>\n \n <td>PHI</td>\n \n <td>2</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>TOR</td>\n \n <td>2</td>\n \n <td>BOS</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>PHO</td>\n \n <td>1</td>\n \n <td>HOU</td>\n \n <td>4</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>MIN</td>\n \n <td>2</td>\n \n <td>MIA</td>\n \n <td>5</td>\n \n <td>0.0</td>\n \n </tr>\n \n <tr>\n \n <td>0.0</td>\n \n <td>DAL</td>\n \n <td>4</td>\n \n <td>PHO</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>HOU</td>\n \n <td>3</td>\n \n <td>POR</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>WAS</td>\n \n <td>3</td>\n \n <td>CHA</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n <tr>\n \n <td>1.0</td>\n \n <td>POR</td>\n \n <td>3</td>\n \n <td>MIN</td>\n \n <td>3</td>\n \n <td>1.0</td>\n \n </tr>\n \n </tbody>\n </table>\n </div>\n </div>\n </div>\n </div>\n </div>\n \n <script class=\"pd_save\">\n $(function() {\n var tableWrapper = $('.df-table-wrapper-2d655c46');\n var fixedHeader = $('.fixed-header', tableWrapper);\n var tableContainer = $('.df-table-container', tableWrapper);\n var table = $('.df-table', tableContainer);\n var rows = $('tbody > tr', table);\n var total = 100;\n \n fixedHeader\n .css('width', table.width())\n .find('.fixed-cell')\n .each(function(i, e) {\n $(this).css('width', $('.df-table-wrapper-2d655c46 th:nth-child(' + (i+1) + ')').css('width'));\n });\n \n tableContainer.scroll(function() {\n fixedHeader.css({ left: table.position().left });\n });\n \n rows.on(\"click\", function(e){\n var txt = e.delegateTarget.innerText;\n var splits = txt.split(\"\\t\");\n var len = splits.length;\n var hdrs = $(fixedHeader).find(\".fixed-cell\");\n // Add all cells in the selected row as a map to be consumed by the target as needed\n var payload = {type:\"select\", targetDivId: \"\" };\n for (var i = 0; i < len; i++) {\n payload[hdrs[i].innerHTML] = splits[i];\n }\n \n //simple selection highlighting, client adds \"selected\" class\n $(this).addClass(\"selected\").siblings().removeClass(\"selected\");\n $(document).trigger('pd_event', payload);\n });\n \n $('.df-table-search', tableWrapper).keyup(function() {\n var val = '^(?=.*\\\\b' + $.trim($(this).val()).split(/\\s+/).join('\\\\b)(?=.*\\\\b') + ').*$';\n var reg = RegExp(val, 'i');\n var index = 0;\n \n rows.each(function(i, e) {\n if (!reg.test($(this).text().replace(/\\s+/g, ' '))) {\n $(this).attr('class', 'hidden');\n }\n else {\n $(this).attr('class', (++index % 2 == 0 ? 'even' : 'odd'));\n }\n });\n $('.df-table-search-count', tableWrapper).html('Showing ' + index + ' of ' + total + ' rows');\n });\n });\n \n $(\".df-table-wrapper td:contains('http://')\").each(function(){var tc = this.textContent; $(this).wrapInner(\"<a target='_blank' href='\" + tc + \"'></a>\");});\n $(\".df-table-wrapper td:contains('https://')\").each(function(){var tc = this.textContent; $(this).wrapInner(\"<a target='_blank' href='\" + tc + \"'></a>\");});\n </script>\n \n </div>",
"text/plain": "<IPython.core.display.HTML object>"
},
"metadata": {}
}
],
"source": "rawpredictionsdf = model.stages[1].summary.predictions\ndisplay(rawpredictionsdf)"
},
{
"execution_count": 45,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "Train Area Under ROC = 0.627433\n"
}
],
"source": "train_area_under_roc = model.stages[1].summary.areaUnderROC\nprint(\"Train Area Under ROC = %g\" % train_area_under_roc)"
},
{
"execution_count": 46,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "Test Area Under ROC = 0.590336\n"
}
],
"source": "predictions = model.transform(test_data)\nevaluator = BinaryClassificationEvaluator(labelCol=labelCol,rawPredictionCol=\"prediction\", metricName=\"areaUnderROC\")\ntest_area_under_roc = evaluator.evaluate(predictions)\nprint(\"Test Area Under ROC = %g\" % test_area_under_roc)"
},
{
"source": "<a id=\"load\"></a>\n## 3. Deploy model to Watson Machine Learning",
"cell_type": "markdown",
"metadata": {
"collapsed": true
}
},
{
"source": "In this section you will learn how to store the model in Watson Machine Learning repository using a client library.\n\nFirst, you must import client libraries.\n\n**Note**: Apache\u00ae Spark 2.0 is required.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 47,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "from repository_v3.mlrepositoryclient import MLRepositoryClient\nfrom repository_v3.mlrepositoryartifact import MLRepositoryArtifact\nfrom repository_v3.mlrepository import MetaProps, MetaNames\nimport json\nimport requests\nimport urllib3"
},
{
"source": "Authenticate to Watson Machine Learning service on Bluemix.",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "**Action**: Put authentication information from your instance of Watson Machine Learning service here.</div>",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 48,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "wml_credentials={\n \"url\": \"https://ibm-watson-ml.mybluemix.net\",\n \"access_key\": \"xxx\",\n \"username\": \"xxx\",\n \"password\": \"xxx\",\n \"instance_id\": \"xxx\"\n}"
},
{
"execution_count": 49,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "# The code was removed by DSX for sharing."
},
{
"source": "**Tip**: `url`, `instance_id`, `username` and `password` can be found on **Service Credentials** tab of service instance created in Bluemix. If you cannot see **instance_id** field in **Serice Credentials** generate new credentials by pressing **New credential (+)** button.",
"cell_type": "markdown",
"metadata": {
"collapsed": true
}
},
{
"source": "Authenticate to the Watson Machine Learning service on IBM Cloud.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 50,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "ml_repository_client = MLRepositoryClient(wml_credentials['url'])\nml_repository_client.authorize(wml_credentials['username'], wml_credentials['password'])"
},
{
"source": "Create model artifact (abstraction layer).",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 51,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "pipeline_artifact = MLRepositoryArtifact(pipeline_lr, name=\"pipeline\")"
},
{
"source": "**Tip**: The MLRepositoryArtifact method expects a trained model object, training data, and a model name. (It is this model name that is displayed by the Watson Machine Learning service).",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 52,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "deployment_name = 'NBA Game Predictor POSTGRESQL'\nmodel_artifact = MLRepositoryArtifact(model, pipeline_artifact=pipeline_artifact, training_data=train_data, name=deployment_name)"
},
{
"source": "## <font color=\"red\">Save the model</font>",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 53,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": "saved_model = ml_repository_client.models.save(model_artifact)"
},
{
"source": "Get saved model metadata from Watson Machine Learning.",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "**Tip**: Use meta.available_props() to get the list of available props.",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 54,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"execution_count": 54,
"metadata": {},
"data": {
"text/plain": "dict_keys(['trainingDataReference', 'frameworkVersion', 'inputDataSchema', 'tags', 'hyperParameters', 'version', 'frameworkName', 'label_column', 'trainingDataSchema', 'framework_runtimes', 'contentStatus', 'trainingDefinitionVersionUrl', 'creationTime', 'modelVersionUrl', 'runtimes'])"
},
"output_type": "execute_result"
}
],
"source": "saved_model.meta.available_props()"
},
{
"execution_count": 55,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "creationTime: 2018-04-19 19:20:46.221000+00:00\nmodelVersionUrl: https://ibm-watson-ml.mybluemix.net/v3/ml_assets/models/6bfc2ad3-73af-4a23-a033-9fbb5a0e7f78/versions/4eef4da5-acfa-4c52-a211-550452ec957e\nlabel: homeTeamWin\nmodelID: 6bfc2ad3-73af-4a23-a033-9fbb5a0e7f78\n"
}
],
"source": "print(\"creationTime: \" + str(saved_model.meta.prop(\"creationTime\")))\nprint(\"modelVersionUrl: \" + saved_model.meta.prop(\"modelVersionUrl\"))\nprint(\"label: \" + saved_model.meta.prop(\"label_column\"))\nprint(\"modelID: \" + str(saved_model.uid))"
},
{
"source": "**Tip**: `modelID` is our model unique indentifier in the Watson Machine Learning repository.",
"cell_type": "markdown",
"metadata": {}
},
{
"source": "<a id=\"deployment\"></a>\n## 4. Create a deployment",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 56,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"execution_count": 56,
"metadata": {},
"data": {
"text/plain": "'BearereyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJ0ZW5hbnRJZCI6ImVmY2VkNzExLTY1YjctNDI1OS04OTkxLTBiZDJmN2YyOWQwZSIsImluc3RhbmNlSWQiOiJlZmNlZDcxMS02NWI3LTQyNTktODk5MS0wYmQyZjdmMjlkMGUiLCJwbGFuSWQiOiIzZjZhY2Y0My1lZGU4LTQxM2EtYWM2OS1mOGFmM2JiMGNiZmUiLCJyZWdpb24iOiJ1cy1zb3V0aCIsInVzZXJJZCI6IjA2MDBkZDk3LTlhYmMtNGY3MC1iMjQ5LTNhOGM5ODA2MTA1YyIsImlzcyI6Imh0dHBzOi8vaWJtLXdhdHNvbi1tbC5teWJsdWVtaXgubmV0L3YzL2lkZW50aXR5IiwiaWF0IjoxNTI0MTY1NjQ5LCJleHAiOjE1MjQxOTQ0NDl9.i3ffO5Pq-xYaKkug6zJi2SshpHFK186k_goBp5Fo3UPSlXxQ8NflIAcXD8pkpwmwcFNO2xnE9RUsUYwEZlewl-4aOftZOSWIgWR6u7h1wVCjwoVGmAU6JZL4vdYc5UnXfRbzLTXz0Py8q-2edZ8EAhuU0XbhFWhOoQS7ZoHTdRiVcK1-R8uT8YDr-q2RePDJDIzVTeAne0PQK6zV8MrkaQm3jh6grmsD4vDYFr510OSDJJoib03UOSpuzGP-njLW3T1wP819cbIqsYkhf0yAlbiWGb7PYJDY4BjOby_roOUDpaeTpkXa9mDP6-Ng31ctidrYYrBavSAvKxX_HkZVcA'"
},
"output_type": "execute_result"
}
],
"source": "headers = urllib3.util.make_headers(basic_auth='{}:{}'.format(wml_credentials['username'], wml_credentials['password']))\nurl = '{}/v3/identity/token'.format(wml_credentials['url'])\nresponse = requests.get(url, headers=headers)\nml_token = 'Bearer' + json.loads(response.text).get('token')\nml_token"
},
{
"execution_count": 57,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"execution_count": 57,
"metadata": {},
"data": {
"text/plain": "'https://ibm-watson-ml.mybluemix.net/v3/wml_instances/efced711-65b7-4259-8991-0bd2f7f29d0e/published_models/6bfc2ad3-73af-4a23-a033-9fbb5a0e7f78/deployments/b4b09d68-8f3e-4524-9857-208b0f8ac5b2/online'"
},
"output_type": "execute_result"
}
],
"source": "deployment_url = wml_credentials['url'] + \"/v3/wml_instances/\" + wml_credentials['instance_id'] + \"/published_models/\" + saved_model.uid + \"/deployments/\"\ndeployment_header = {'Content-Type': 'application/json', 'Authorization': ml_token}\ndeployment_payload = {\"type\": \"online\", \"name\": deployment_name}\ndeployment_response = requests.post(deployment_url, json=deployment_payload, headers=deployment_header)\ndeployment_response.text\nscoring_url = json.loads(deployment_response.text).get('entity').get('scoring_url')\nscoring_url"
},
{
"source": "<a id=\"test\"></a>\n## 5. Test the deployment\n\nFor April 19, 2018 matchup of Philadelphia Sixers vs. Miami Heat in Miami\n\nPHL\n\n- season win %: 634\n- wins last 5: 4\n\nMIA\n\n- season win %: 537\n- wins last 5: 2\n",
"cell_type": "markdown",
"metadata": {}
},
{
"execution_count": 58,
"cell_type": "code",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": "{\n \"fields\": [\"homeTeamHomeWinPercent\", \"homeTeamLastFive\", \"awayTeamAwayWinPercent\", \"awayTeamLastFive\", \"features\", \"rawPrediction\", \"probability\", \"prediction\"],\n \"values\": [[537, 2, 634, 4, [537.0, 2.0, 634.0, 4.0], [0.19970850501247311, -0.19970850501247311], [0.5497618464246237, 0.4502381535753764], 0.0]]\n}\n"
}
],
"source": "def get_prediction_from_watson_ml(homeTeamHomeWinPercent, homeTeamLastFive, awayTeamAwayWinPercent, awayTeamLastFive):\n scoring_header = {'Content-Type': 'application/json', 'Authorization': ml_token}\n scoring_payload = {'fields': ['homeTeamHomeWinPercent','homeTeamLastFive','awayTeamAwayWinPercent','awayTeamLastFive'], 'values': [[homeTeamHomeWinPercent, homeTeamLastFive, awayTeamAwayWinPercent, awayTeamLastFive]]}\n scoring_response = requests.post(scoring_url, json=scoring_payload, headers=scoring_header)\n return scoring_response.text\n\nresponse = get_prediction_from_watson_ml(537, 2, 634, 4)\nprint(response)\n"
},
{
"execution_count": null,
"cell_type": "code",
"metadata": {},
"outputs": [],
"source": ""
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.5 with Spark 2.1",
"name": "python3-spark21",
"language": "python"
},
"language_info": {
"mimetype": "text/x-python",
"nbconvert_exporter": "python",
"version": "3.5.4",
"name": "python",
"file_extension": ".py",
"pygments_lexer": "ipython3",
"codemirror_mode": {
"version": 3,
"name": "ipython"
}
}
},
"nbformat": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment