Skip to content

Instantly share code, notes, and snippets.

@Deryck97
Deryck97 / play_by_play_python.md
Last active December 10, 2022 08:54
An introduction to working with nflscrapR data in Python

nflscrapR Python Guide

This is an introduction to working with nflscrapR data in Python. This is inspired by this guide by Ben Baldwin.

Using Jupyter Notebooks which come pre-installed with Anaconda is typically the best way to work with data in Python. This guide assumes you are using the Ananconda distribution and therefore already have the required packages installed. If you are not using the Anaconda distribution, install numpy, pandas, and matplotlib.

Once Anaconda has been downloaded and installed, open the Anaconda Navigator. Click launch on the Jupyter Notebook section which will open in your browser.

Collecting and Cleaning Data

There are a couple ways to get nflscrapR data. While you don't necessarily need R for historical data, it is necessary for getting data that has not been uploaded to github. My preferred process is to get data u

nflfastR Python Guide

August 2021: This serves as the first refresh of this guide. My hope is that this guide is a constant work in progress, receiving updates as I receive requests and discover new things myself. This update features more data visualization examples and a more detailed filtering section. I've also posted all relevant code in the guide to github in a Jupyter Notebook, found here.

This guide serves as an update to my original nflscrapR Python Guide. As of 2020, nflscrapR is defunct and nflfastR has taken its place. As the name implies, the library has made the process of scraping new play by play data much faster.

Using Jupyter Notebooks or Jupyter Lab, which come pre-installed with Anaconda is typically the best way to work with