Created
June 23, 2021 18:23
-
-
Save russellpierce/5210ea5a98b78f6c600e109805bd54c7 to your computer and use it in GitHub Desktop.
Does the increase in # of ranks with time change the way we look at the Tidy Tuesday park data for Texas?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# 2021-06-23: Tidy Tuesday | |
# Inspired by, and some code copied from Jesse Mostipak | |
# https://twitter.com/kierisi/status/1407171923633651717 | |
## Background ## | |
# Thought Experiment/Example: | |
# Imagine I'm the 5th tallest kid in a class of 40. | |
# Then my class size doubles to 80. | |
# Now I'm probably not the 5th tallest anymore. | |
# I'm probably like about the 10th tallest. | |
# I didn't get any shorter, but my context changed. | |
# So my rank changed. | |
# With regards to Jesse's post: | |
# I wondered (https://twitter.com/RussellSPierce/status/1407327943110631430?s=20) if the number of ranks increased over time. | |
# If they do, that could explain why the rankings of Texas parks fell over time. | |
## Load packages ## | |
library(tidyverse) | |
library(ggbump) | |
library(readr) | |
# One new packages: | |
library(patchwork) # allows positioning multiple ggplots together in the same image | |
# I dropped a couple packages from Jesse's example because I wasn't using them. | |
## Load Data ## | |
parks <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-06-22/parks.csv') | |
## Check for Number of Parks Increasing ## | |
# I actually check for the number of ranks increasing. | |
# But, without a lot of ties, the number of ranks and number of parks will be similar. | |
number_of_ranks_increase <- parks %>% | |
group_by(year) %>% | |
mutate(`Number of Ranks` = max(rank)) %>% | |
ggplot(aes(x=year, y=`Number of Ranks`)) + | |
geom_line() + | |
labs(title = "Number of Ranks Increases Between 2012 and 2016") | |
# This plot shows up at the top | |
## Copying Jesse's Plot ## | |
# Borrowing Jesse's list of Texas cities | |
texas_cities <- c("Dallas", "Austin", "Arlington, Texas", "Fort Worth", | |
"El Paso", "Houston", "San Antonio") | |
raw_ranks <- normalized_rank_data %>% | |
filter(city %in% texas_cities) %>% | |
ggplot(aes(year, rank, color = city)) + | |
geom_point(size = 3) + | |
geom_bump() + | |
theme_minimal() + | |
scale_y_reverse(name = "Rank") + | |
# I add a 'title' to her plot so I can tell it apart from the others in my final image | |
labs(title = "Raw Ranks Decrease with Time") | |
# This plot shows up at the bottom left | |
## Solve for a Changing Number of Ranks ## | |
# There are lot of ways to try to adjust when the number of ranks changes over time. | |
# I chose to just rescale the rank data between 0 and 1. | |
# I do that on a year by year basis (`group_by(year)`). | |
# If I rescale straight up, | |
# then the worst park in a given year would have a score of 1 | |
# and the best park would be a 0. | |
# I didn't want that. | |
# I like it when my low numbers are bad and high numbers are good. | |
normalized_rank_data <- parks %>% | |
group_by(year) %>% | |
mutate(relative_rank = rescale(-rank, to =c(0, 1))) %>% | |
select(year, rank, relative_rank, city, total_points) | |
# This is kind of similar to a 'percentile' so I decided to format the data that way | |
relative_ranks <- normalized_rank_data %>% | |
filter(city %in% texas_cities) %>% | |
ggplot(aes(year, relative_rank, color = city)) + | |
geom_point(size = 3) + | |
geom_bump() + | |
theme_minimal() + | |
labs(title = "Relative Ranks are Stable Across Time") + | |
scale_y_continuous(labels=percent, name = "Relative Rank") | |
# This plot shows up at the bottom right | |
## Display Plots ## | |
number_of_ranks_increase / {raw_ranks + relative_ranks} | |
## Summary ## | |
# The number of ranked parks increase between 2012 and 2016. | |
# This leads to a drop in the rank of the parks we looked at in Texas in the initial post. | |
# Rescaling the rank per year appears to indicate that Texas parks aren't getting worse. | |
# Texas parks looking okay y'all. :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Data Source: https://github.com/rfordatascience/tidytuesday/tree/master/data/2021/2021-06-22