Skip to content

Instantly share code, notes, and snippets.

@cavedave
Last active September 30, 2016 20:20
Show Gist options
  • Save cavedave/21c6beff2d371e9df323c292b1dc3afa to your computer and use it in GitHub Desktop.
Save cavedave/21c6beff2d371e9df323c292b1dc3afa to your computer and use it in GitHub Desktop.
---
title: "Basketball Size"
output: html_notebook
gif is up at http://imgur.com/dobmMWM
---
This is a copy of this NFL visualisation.<http://noahveltman.com/nflplayers/>
I couldnt find the R code to recreate it. I could find data for basketball at <https://github.com/simonwarchol/NBA-Height-Weight>.
First get Simon Warchols data and stitch his csvs together
```{python}
#Stick together all the csvs and add in the year.
import os
import csv
import sys # imports the sys module
#file to concatinate all the csv of basketball data together
directory = os.path.join("basketball/simonwarchol-NBA-Height-Weight-7871d8b/CSVs/Yearly")
ofile = open('ttest.csv', "wt")
writer = csv.writer(ofile, delimiter=' ', quotechar='"', quoting=csv.QUOTE_ALL)
writer.writerow(["Name","Height","HeightFI","Weight","Year"])
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(directory+"/"+file, 'r')
ifile = open(directory+"/"+file, "rt")
reader = csv.reader(ifile)
rownum = 0
result =""
for row in reader:
if rownum == 0:
header = row
else:
result = row + [os.path.splitext(file)[0]]
writer.writerow(result)
rownum += 1
ifile.close()
ofile.close()
```
now load data into R
```{r}
mydata = read.csv("ttest.csv", header=TRUE, sep="\t")
head(mydata)
```
should look like
head(mydata)
Name Height HeightFI Weight Year
1 Don Anielak 79 6-7 190 1955
2 Paul Arizin 76 6-4 190 1955
Now we need only the height, weight and year. Weight rounded into 10lbs buckets.
```{r}
library(dplyr)
mydata <- dplyr::select(mydata, Weight, Height, Year)
mydata$Weight2 <- as.integer(round((mydata$Weight-4)/10)*10)
sizePer <- mydata%>%
group_by(Weight2, Height, Year)%>%
mutate(countT = n())%>%
group_by(Year)%>%
mutate(countY = n())%>%
mutate(per = (countT/countY)*100)
sizePer$bin <- cut(sizePer$per, breaks=c(-1:4,Inf),labels=c(as.character(0:4),'5+'))
```
Now make a picture of this
```{r}
library(ggplot2)
library(dplyr)
library(animation)
saveGIF({
for(i in 1955:2014){
print(ggplot(sizeb %>% filter(Year == i),
aes(x=Weight2, y=Height,fill=bin)) +
geom_tile(color="white", size=0.1)+
theme_bw()+
theme(legend.position="top", plot.title = element_text(size=30, face="bold"))+
coord_cartesian(xlim = c(130,330), ylim = c(63,91)) +
scale_fill_manual("%",values = c("#fee5d9","#fcbba1","#fc9272","#fb6a4a","#de2d26","#a50f15"),drop=FALSE)+
annotate(x=320, y=63, geom="text", label=i, size = 9) +
annotate(x=130, y=30, geom="text", label="@iamreddave", size = 3) +
ylab("Height Inches") + # Remove x-axis label
xlab("Weight (lbs)")+
scale_x_continuous(breaks = seq(130,330, by=20)) +
scale_y_continuous(breaks = seq(63,91, by=1))+
ggtitle("NBA players: Height and Weight over time")
)}
}, interval=0.25,ani.width = 900, ani.height = 600)
```
I couldn't find year,height, weight data for premier league football, rugby, sumo <http://fivethirtyeight.com/features/the-sumo-matchup-centuries-in-the-making/> or baby height weight data over the years. If you can or find a similar cool dataset please let me know.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment