Skip to content

Instantly share code, notes, and snippets.

@Btibert3
Created February 9, 2011 00:21
Show Gist options
  • Save Btibert3/817607 to your computer and use it in GitHub Desktop.
Save Btibert3/817607 to your computer and use it in GitHub Desktop.
###############################################################################
# Author: @BrockTibert
# Purpose: Collect Historical NHL Skater Stats 1960 - 2011 (in progress)
# Date: February 2011
#
# Used: R Version 2.12.1, Windows 7 Pro, StatET Plugin for Eclipse
#
# # Copyright (c) 2011, under the Simplified BSD License.
# For more information on FreeBSD see: http://www.opensource.org/licenses/bsd-license.php
# All rights reserved.
###############################################################################
#-----------------------------------------------------------------------
# set up script level basics
#-----------------------------------------------------------------------
## libraries
library(XML)
## directory for the project
DIR <- "C:/Users/Brock/Documents/My Dropbox/Projects/NHL"
setwd(DIR)
#-----------------------------------------------------------------------
# Create a function that will take a year and return a dataframe
#-----------------------------------------------------------------------
GrabSkaters <- function(S) {
# The function takes parameter S which is a string and represents the Season
# Returns: data frame
## create the URL
URL <- paste("http://www.hockey-reference.com/leagues/NHL_",
S, "_skaters.html", sep="")
## grab the page -- the table is parsed nicely
tables <- readHTMLTable(URL)
ds.skaters <- tables$stats
## determine if the HTML table was well formed (column names are the first record)
## can either read in directly or need to force column names
## and
## I don't like dealing with factors if I don't have to
## and I prefer lower case
for(i in 1:ncol(ds.skaters)) {
ds.skaters[,i] <- as.character(ds.skaters[,i])
names(ds.skaters) <- tolower(colnames(ds.skaters))
}
## fix a couple of the column names
colnames(ds.skaters)
names(ds.skaters)[10] <- "plusmin"
names(ds.skaters)[17] <- "spct"
## finally fix the columns - NAs forced by coercion warnings
for(i in c(1, 3, 6:18)) {
ds.skaters[,i] <- as.numeric(ds.skaters[, i])
}
## convert toi to seconds, and seconds/game
ds.skaters$seconds <- (ds.skaters$toi*60)/ds.skaters$gp
## remove the header and totals row
ds.skaters <- ds.skaters[!is.na(ds.skaters$rk), ]
ds.skaters <- ds.skaters[ds.skaters$tm != "TOT", ]
## add the year
ds.skaters$season <- S
## return the dataframe
return(ds.skaters)
}
#-----------------------------------------------------------------------
# Use the function to loop over the seasons and piece together
#-----------------------------------------------------------------------
## define the seasons -- 2005 dataset doesnt exist
## if I was a good coder I would trap the error, but this works
SEASON <- as.character(c(1960:2004, 2006:2011))
## create an empy dataset that we will append to
dataset <- data.frame()
## loop over the seasons, use the function to grab the data
## and build the dataset
for (S in SEASON) {
temp <- GrabSkaters(S)
dataset <- rbind(dataset, temp)
print(paste("Completed Season ", S, sep=""))
## pause the script so we don't kill their servers
Sys.sleep(3)
}
## save the dataset
write.table(dataset, "Historical Skater Stats.csv", sep=",",
row.names=F)
@ramnathv
Copy link

ramnathv commented Feb 9, 2011

Nice post! You can considerably simplify your code by using functions from the package plyr. See the following gist for reference: https://gist.github.com/817883

Cheers,
Ramnath

@acompa
Copy link

acompa commented Feb 9, 2011

This is the first time I've seen the XML library in use (new to R), so thanks for sharing your code!

@jianchongsu
Copy link

THANK YOU ,

@estepany
Copy link

so if I wanted to do this by month (M), day (D), year (Y), what alterations would I make? New to coding...

@cjgeek
Copy link

cjgeek commented Aug 24, 2016

Facing this error :-

Error: failed to load external entity "http://www.hockey-reference.com/leagues/NHL_1960_skaters.html"

@trisretnoaryani
Copy link

trisretnoaryani commented Feb 7, 2017

Hi, can you help me? I'm facing this error:

Error in ds.skaters$toi * 60 : non-numeric argument to binary operator
Calls: ... withCallingHandlers -> withVisible -> eval -> eval -> GrabSkaters
In addition: Warning message:
In in_dir(input_dir(), evaluate(code, envir = env, new_device = FALSE, :
You changed the working directory to C:/Users/SV SEPAT/Documents/Tes R (probably via setwd()). It will be restored to C:/Users/SV Sepat/Documents/Tes R. See the Note section in ?knitr::knit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment