Last active
January 28, 2025 00:45
-
-
Save 903124/d304f76688b0699497a35b61b6d1e267 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# How to create EPA rating by yourself\n", | |
"\n", | |
"By using nfldb that imports NFL data from NFL.com's JSON, one can easily calculate estimated points added (EPA) ranking, which provide a good estimation of strength of NFL team." | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"First [Nfldb](https://github.com/BurntSushi/nfldb) has to be installed to the computer. [Windows installation guidelines](https://github.com/BurntSushi/nfldb/wiki/Detailed-Windows-PostgreSQL-installation)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"The first step is to read NFL regular season data from 15-17" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import csv\n", | |
"import pandas as pd\n", | |
"import numpy as np\n", | |
"import nfldb\n", | |
"import math\n", | |
"import matplotlib.pyplot as plt\n", | |
"\n", | |
"EPA_observed = np.zeros(100)\n", | |
"EPA_play = np.zeros(100)\n", | |
"\n", | |
"\n", | |
"\n", | |
"db = nfldb.connect()\n", | |
"q = nfldb.Query(db) \n", | |
"games = q.game(season_year=[2017], season_type='Regular').as_games()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Second, read the outcome of all nfl drives than adding 7 points for each touchdown, 3 points for each field goal and -2 points for each safety in each of 1st and 10 situation." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"for index, game in enumerate(games):\n", | |
" for drive in game.drives:\n", | |
" \n", | |
" if(drive.result == 'Field Goal')or (drive.result == 'Punt') or (drive.result == 'Touchdown') or (drive.result == 'Missed FG') or (drive.result == 'Safety') or (drive.result == 'Fumble, Safety') or (drive.result == 'Interception') or (drive.result == 'Fumble'):\n", | |
"\n", | |
"\n", | |
" for play in drive.plays:\n", | |
" if ((play.passing_att == 1) or (play.rushing_att == 1) or (play.passing_sk == 1)) and (int(play.down) == 1):\n", | |
"\n", | |
"\n", | |
" yard_str = str(play.yardline) #tidy up the data\n", | |
" yard_split_str = yard_str.split()\n", | |
" pos_indicate = yard_split_str[0]\n", | |
"\n", | |
" if str(game.home_team) == str(play.pos_team):\n", | |
" opp_team = str(game.away_team)\n", | |
" else:\n", | |
" opp_team = str(game.home_team)\n", | |
"\n", | |
"\n", | |
" if pos_indicate == 'OWN':\n", | |
" yardlinefromstr = int(yard_split_str[1])\n", | |
" elif pos_indicate == 'OPP':\n", | |
" yardlinefromstr = 100 - int(yard_split_str[1])\n", | |
" else:\n", | |
" yardlinefromstr = 50\n", | |
" \n", | |
" end_field_str = str(drive.end_field) \n", | |
" end_field_split_str = end_field_str.split()\n", | |
" #print(end_field_split_str)\n", | |
" end_field_pos_indicate = end_field_split_str[0]\n", | |
"\n", | |
" if end_field_pos_indicate == 'OWN':\n", | |
" end_field_fromstr = int(end_field_split_str[1])\n", | |
" elif end_field_pos_indicate == 'OPP':\n", | |
" end_field_fromstr = 100 - int(end_field_split_str[1])\n", | |
" else:\n", | |
" end_field_fromstr = 50 \n", | |
"\n", | |
" if(int(play.down) == 1): \n", | |
" if(drive.result == 'Field Goal'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" EPA_observed[yardlinefromstr-1] += 3\n", | |
" if (drive.result == 'Missed FG'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" if (drive.result == 'Interception'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" if (drive.result == 'Fumble'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" if(drive.result == 'Punt'):\n", | |
" \n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" if(drive.result == 'Safety') or (drive.result == 'Fumble, Safety'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" EPA_observed[yardlinefromstr-1] -= 2\n", | |
" if(drive.result == 'Touchdown'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" EPA_observed[yardlinefromstr-1] += 7" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Now we have estimated points for every yard line. Since we have to calculate the points deducted for each punt, we use the estimated points on the above calculation and deduct points according to opponent's estimated starting field position. Here we repeat the calculation few time in order to make the result converge." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"for i in range(4): \n", | |
" \n", | |
" for j in range(99):\n", | |
" if(EPA_play[j] > 0):\n", | |
" EPA_observed[j] = float(EPA_observed[j]) / float(EPA_play[j])\n", | |
" \n", | |
" \n", | |
" temp_EPA = EPA_observed\n", | |
" EPA_observed = np.zeros(100)\n", | |
" EPA_play = np.zeros(100)\n", | |
"\n", | |
" \n", | |
"\n", | |
" for game in games:\n", | |
" for drive in game.drives:\n", | |
" \n", | |
" if(drive.result == 'Field Goal')or (drive.result == 'Punt') or (drive.result == 'Touchdown') or (drive.result == 'Missed FG') or (drive.result == 'Safety') or (drive.result == 'Fumble, Safety') or (drive.result == 'Interception') or (drive.result == 'Fumble'):\n", | |
"\n", | |
" for play in drive.plays:\n", | |
" if ((play.passing_att == 1) or (play.rushing_att == 1) or (play.passing_sk == 1)) and (int(play.down) == 1):\n", | |
" yard_str = str(play.yardline) #tidy up the data\n", | |
" yard_split_str = yard_str.split()\n", | |
" pos_indicate = yard_split_str[0]\n", | |
"\n", | |
" if str(game.home_team) == str(play.pos_team):\n", | |
" opp_team = str(game.away_team)\n", | |
" else:\n", | |
" opp_team = str(game.home_team)\n", | |
"\n", | |
"\n", | |
" if pos_indicate == 'OWN':\n", | |
" yardlinefromstr = int(yard_split_str[1])\n", | |
" elif pos_indicate == 'OPP':\n", | |
" yardlinefromstr = 100 - int(yard_split_str[1])\n", | |
" else:\n", | |
" yardlinefromstr = 50\n", | |
" end_field_str = str(drive.end_field) \n", | |
" end_field_split_str = end_field_str.split()\n", | |
" #print(end_field_split_str)\n", | |
" end_field_pos_indicate = end_field_split_str[0]\n", | |
"\n", | |
" if end_field_pos_indicate == 'OWN':\n", | |
" end_field_fromstr = int(end_field_split_str[1])\n", | |
" elif end_field_pos_indicate == 'OPP':\n", | |
" end_field_fromstr = 100 - int(end_field_split_str[1])\n", | |
" else:\n", | |
" end_field_fromstr = 50 \n", | |
"\n", | |
" yardline = yardlinefromstr\n", | |
"\n", | |
" if(drive.result == 'Field Goal'): \n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" EPA_observed[yardlinefromstr-1] += 3\n", | |
" if (drive.result == 'Missed FG'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" if (drive.result == 'Interception'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
"\n", | |
"\n", | |
" EPA_observed[yardlinefromstr-1]-= temp_EPA[100-end_field_fromstr]\n", | |
" if (drive.result == 'Fumble'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" EPA_observed[yardlinefromstr-1]-= temp_EPA[100-end_field_fromstr]\n", | |
" if(drive.result == 'Punt'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
"\n", | |
" #deducting the value from each punt\n", | |
" EPA_observed[yardlinefromstr-1] -= temp_EPA[100 - int(-0.0116 * end_field_fromstr * end_field_fromstr + 1.5343 * end_field_fromstr + 37.91) - 1]\n", | |
" if(drive.result == 'Safety') or (drive.result == 'Fumble, Safety'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" EPA_observed[yardlinefromstr-1] -= 2\n", | |
" if(drive.result == 'Touchdown'):\n", | |
" EPA_play[yardlinefromstr-1] += 1\n", | |
" EPA_observed[yardlinefromstr-1] += 7\n", | |
" \n", | |
"for i in range(99):\n", | |
" if(EPA_play[i] > 0):\n", | |
" EPA_observed[i] = float(EPA_observed[i]) / float(EPA_play[i])\n", | |
"\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"After obtaining the observed estimated points value, we smooth the curve for more accurate vaule since it contains error due to random nature of the game" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"cof = np.polyfit(np.linspace(1,99,num=99),EPA_observed[0:99],5)\n", | |
"x = np.linspace(1,99,num=99)\n", | |
"EPA_observed_smooth = cof[0]*x**5 + cof[1]*x**4 + cof[2]*x**3 + cof[3]*x**2 + cof[4]*x + cof[5]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"By calculating the change of estimated points for every drives (hence estimated points added), EPA of each teams can be obtained." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [], | |
"source": [ | |
"team_list = np.array(['ARI','ATL','BAL','BUF','CAR','CHI','CIN','CLE','DAL','DEN','DET','GB','HOU','IND','JAX','KC','LAC','LA','MIA','MIN','NE','NO'\n", | |
" ,'NYG','NYJ','OAK','PHI','PIT','SEA','SF','TB','TEN','WAS'])\n", | |
"# team_list = np.array(['ARI','ATL','BAL','BUF','CAR','CHI','CIN','CLE','DAL','DEN','DET','GB','HOU','IND','JAX','KC','LA','MIA','MIN','NE','NO'\n", | |
"# ,'NYG','NYJ','OAK','PHI','PIT','SD','SEA','SF','TB','TEN','WAS'])\n", | |
" #2016 'STR' -> 'LA'\n", | |
" #2017 'SD' -> 'LAC'\n", | |
"\n", | |
"EPA_team = np.zeros(32)\n", | |
"O_EPA = np.zeros(32)\n", | |
"D_EPA = np.zeros(32)\n", | |
"game_count = np.zeros(32)\n", | |
"next_drive_start_yardline = 50 #initialize only\n", | |
"\n", | |
"\n", | |
"db = nfldb.connect()\n", | |
"q = nfldb.Query(db)\n", | |
"games = q.game(season_year=2018, season_type='Regular').as_games()\n", | |
"\n", | |
"epa_result = np.empty(0)\n", | |
"nfl_week = 0\n", | |
"for game in games:\n", | |
" if(game.week > nfl_week and game.finished == 1):\n", | |
" nfl_week += 1\n", | |
" if game.finished == 1:\n", | |
" game_count[np.where(team_list==game.home_team)[0][0]] += 1\n", | |
" game_count[np.where(team_list==game.away_team)[0][0]] += 1\n", | |
" for drive in game.drives:\n", | |
" \n", | |
" yard_str = str(drive.start_field) #tidy up the data\n", | |
" yard_split_str = yard_str.split()\n", | |
" pos_indicate = yard_split_str[0]\n", | |
"\n", | |
" if str(game.home_team) == str(drive.pos_team):\n", | |
" opp_team = str(game.away_team)\n", | |
" else:\n", | |
" opp_team = str(game.home_team)\n", | |
"\n", | |
" if pos_indicate == 'OWN':\n", | |
" yardlinefromstr = int(yard_split_str[1])\n", | |
" elif pos_indicate == 'OPP':\n", | |
" yardlinefromstr = 100 - int(yard_split_str[1])\n", | |
" else:\n", | |
" yardlinefromstr = 50\n", | |
"\n", | |
"\n", | |
" end_field_str = str(drive.end_field) \n", | |
" end_field_split_str = end_field_str.split()\n", | |
"\n", | |
" end_field_pos_indicate = end_field_split_str[0]\n", | |
"\n", | |
" if end_field_pos_indicate == 'OWN':\n", | |
" end_field_fromstr = int(end_field_split_str[1])\n", | |
" elif end_field_pos_indicate == 'OPP':\n", | |
" end_field_fromstr = 100 - int(end_field_split_str[1])\n", | |
" else:\n", | |
" end_field_fromstr = 50 \n", | |
" EP_start = EPA_observed_smooth[yardlinefromstr-1]\n", | |
"\n", | |
" \n", | |
" if( (drive.result =='End of Game') or (drive.result == 'End of Half')):\n", | |
" EP_end = EP_start\n", | |
"\n", | |
" \n", | |
" if( (drive.result =='Missed FG') or (drive.result =='Interception') or (drive.result =='Fumble') or (drive.result =='Downs') or (drive.result =='Blocked FG') or (drive.result =='Blocked Punt') or (drive.result =='Blocked FG, Downs') or (drive.result =='Blocked Punt, Downs')):\n", | |
" EP_end = -EPA_observed_smooth[100-end_field_fromstr-1]\n", | |
" \n", | |
"\n", | |
" if(drive.result == 'Punt'):\n", | |
" EP_end = -EPA_observed_smooth[next_drive_start_yardline-1]\n", | |
"\n", | |
" \n", | |
"\n", | |
" if(drive.result == 'Touchdown'):\n", | |
" EP_end = 7\n", | |
"\n", | |
" \n", | |
"\n", | |
" if(drive.result == 'Field Goal'): \n", | |
" EP_end = 3\n", | |
"\n", | |
" \n", | |
" \n", | |
" if((drive.result == 'Fumble, Safety') or (drive.result == 'Safety')):\n", | |
" EP_end = -2 - EPA_observed_smooth[next_drive_start_yardline-1]\n", | |
"\n", | |
" epa_result = np.append(epa_result,(EP_end-EP_start))\n", | |
"\n", | |
" EPA_team[np.where(team_list==drive.pos_team)[0][0]] += ( EP_end - EP_start ) \n", | |
" \n", | |
" O_EPA[np.where(team_list==drive.pos_team)[0][0]] += ( EP_end - EP_start ) \n", | |
" \n", | |
" EPA_team[np.where(team_list==opp_team)[0][0]] -= ( EP_end - EP_start ) \n", | |
"\n", | |
" D_EPA[np.where(team_list==opp_team)[0][0]] -= ( EP_end - EP_start ) \n", | |
"\n", | |
" next_drive_start_yardline = yardlinefromstr\n", | |
" \n", | |
"\n", | |
"O_EPA_mean = np.mean(O_EPA)\n", | |
"D_EPA_mean = np.mean(D_EPA)\n", | |
"\n", | |
"for i in range(32):\n", | |
" O_EPA[i]-= O_EPA_mean\n", | |
" D_EPA[i]-= D_EPA_mean" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Finally by calculating each team's win probability using logistic regression, the estimated wins for each NFL teams in 2016 season can be obtained." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%matplotlib inline\n", | |
"\n", | |
"EPA_per_16_games = np.zeros(32)\n", | |
"\n", | |
"Estimated_wins = np.zeros(32)\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"for i in range(32):\n", | |
" \n", | |
" EPA_per_16_games[i] = EPA_team[i] / game_count[i] *16 \n", | |
" Estimated_wins[i] = 16*(1/(1+(2.7128**(-(0.007849*EPA_per_16_games[i])))))\n", | |
" \n", | |
"std_EPA = np.std(EPA_per_16_games)\n", | |
" \n", | |
"for i in range(32):\n", | |
" EPA_per_16_games[i] = EPA_per_16_games[i] * 80 / std_EPA\n", | |
" \n", | |
"output_df = pd.DataFrame({'Team name': team_list, 'EPA per 16 games': EPA_per_16_games, 'Estimated EPA wins': Estimated_wins, \"Offensive EPA\": O_EPA, \"Defensive EPA\": D_EPA}) \n", | |
"fig, ax = plt.subplots()\n", | |
"ax.scatter(O_EPA, D_EPA)\n", | |
"plt.xlabel('Offensive EPA')\n", | |
"plt.ylabel('Defensive EPA')\n", | |
"plt.title('NFL EPA week %d' % nfl_week)\n", | |
"\n", | |
"for i, txt in enumerate(team_list):\n", | |
" ax.annotate(txt, (O_EPA[i], D_EPA[i]))\n", | |
"\n", | |
"output_df = output_df.sort_values('EPA per 16 games')\n", | |
"output_df" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"(Optional) adding image" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import matplotlib.pyplot as plt\n", | |
"from matplotlib.offsetbox import AnnotationBbox, OffsetImage\n", | |
"from matplotlib._png import read_png\n", | |
"import csv\n", | |
"import urllib2\n", | |
"import cStringIO\n", | |
"from PIL import Image\n", | |
"\n", | |
"\n", | |
"fig = plt.gcf()\n", | |
"fig.clf()\n", | |
"ax = plt.subplot(111)\n", | |
"\n", | |
"\n", | |
"url = 'https://raw.githubusercontent.com/statsbylopez/BlogPosts/master/nfl_teamlogos.csv'\n", | |
"response = urllib2.urlopen(url)\n", | |
"cr = csv.reader(response)\n", | |
"\n", | |
"for i,row in enumerate(cr):\n", | |
" if(i != 0):\n", | |
" \n", | |
" file = cStringIO.StringIO(urllib2.urlopen(row[2]).read())\n", | |
"\n", | |
" img = Image.open(file)\n", | |
"\n", | |
" imagebox = OffsetImage(img, zoom=1)\n", | |
" xy = [O_EPA[i-1],D_EPA[i-1]] \n", | |
"\n", | |
"\n", | |
" ab = AnnotationBbox(imagebox, xy,\n", | |
" xybox=(-0,0),\n", | |
" xycoords='data',\n", | |
" boxcoords=\"offset points\",\n", | |
" frameon=False) \n", | |
" ax.add_artist(ab)\n", | |
"\n", | |
"\n", | |
"ax.grid(True)\n", | |
"\n", | |
"plt.xlim([-200,200])\n", | |
"plt.ylim([-200,200])\n", | |
"plt.xlabel('Offensive EPA')\n", | |
"plt.ylabel('Defensive EPA')\n", | |
"plt.title('NFL EPA week %d' % nfl_week)\n", | |
"plt.savefig('EPA_plot.png',dpi=400)\n" | |
] | |
} | |
], | |
"metadata": { | |
"anaconda-cloud": {}, | |
"kernelspec": { | |
"display_name": "Python [Root]", | |
"language": "python", | |
"name": "Python [Root]" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.12" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is the only open-source contribution I have been able to find showing EPA calculation using the nfldb data source. Thanks for posting this! For whatever it's worth this simplified code produces the same result but might be easier to read: