Last active
October 13, 2015 22:23
-
-
Save cpsievert/da555f08f3c9ba2c0b8e to your computer and use it in GitHub Desktop.
Function to get time-sequenced spatial locations of a baseball (optionally summarized over an arbitrary set of variables) using PITCHf/x
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
if (!require("dplyr")) install.packages("dplyr") | |
if (!require("tidyr")) install.packages("tidyr") | |
if (!require("pitchRx")) install.packages("pitchRx") | |
getLocations <- function(dat, ..., summarise = TRUE) { | |
# select and group by columns specified in ... | |
tb <- dat %>% | |
select(..., x0:az) %>% | |
group_by(...) | |
vars <- as.character(attr(tb, "vars")) | |
if (summarise) { | |
# average the PITCHf/x parameters over variables specified in ... | |
labs <- attr(tb, "labels") | |
tb <- tb %>% summarise_each(funs(mean)) | |
} else { | |
# another (more complex way to get variables names) | |
# vars <- as.character(as.list(match.call(expand.dots = FALSE))$...) | |
dat$pitch_id <- seq_len(nrow(dat)) | |
vars <- c(vars, "pitch_id") | |
labs <- dat[vars] | |
} | |
# returns 3D array of locations of pitches over time | |
value <- pitchRx::getSnapshots(as.data.frame(tb)) | |
idx <- labs %>% unite_("id", vars, sep = "@&") | |
dimnames(value) <- list(idx = as.data.frame(idx)[, 1], | |
frame = seq_len(dim(value)[2]), | |
coordinate = c("x", "y", "z")) | |
# tidy things up in a format that ggplot would expect | |
value %>% as.tbl_cube() %>% as.data.frame() %>% rename_(value = ".") %>% | |
mutate(idx = as.character(idx)) %>% | |
separate(idx, vars, sep = "@&") %>% | |
spread(coordinate, value) | |
} |
Thanks! This should work now.
Hi Carson,
I downloaded Pitchfx data for all 2014 and joined atbat and pitch tables for Yu Darvish. But when I run getLocations function, R keeps showing Error in 0:(nplots - 1) : NA/NaN argument. Do you have any idea why this function doesn't work?
Thanks so much!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think there is a bug with something related to the group_by(...). When I run
dat <- getLocations(pitches, pitcher_name, pitch_type, summarise = TRUE)
I am getting an error saying "Error: index out of bounds"