Skip to content

Instantly share code, notes, and snippets.

@russellpierce
Created June 23, 2021 18:23
Show Gist options
  • Save russellpierce/5210ea5a98b78f6c600e109805bd54c7 to your computer and use it in GitHub Desktop.
Save russellpierce/5210ea5a98b78f6c600e109805bd54c7 to your computer and use it in GitHub Desktop.
Does the increase in # of ranks with time change the way we look at the Tidy Tuesday park data for Texas?
# 2021-06-23: Tidy Tuesday
# Inspired by, and some code copied from Jesse Mostipak
# https://twitter.com/kierisi/status/1407171923633651717
## Background ##
# Thought Experiment/Example:
# Imagine I'm the 5th tallest kid in a class of 40.
# Then my class size doubles to 80.
# Now I'm probably not the 5th tallest anymore.
# I'm probably like about the 10th tallest.
# I didn't get any shorter, but my context changed.
# So my rank changed.
# With regards to Jesse's post:
# I wondered (https://twitter.com/RussellSPierce/status/1407327943110631430?s=20) if the number of ranks increased over time.
# If they do, that could explain why the rankings of Texas parks fell over time.
## Load packages ##
library(tidyverse)
library(ggbump)
library(readr)
# One new packages:
library(patchwork) # allows positioning multiple ggplots together in the same image
# I dropped a couple packages from Jesse's example because I wasn't using them.
## Load Data ##
parks <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-06-22/parks.csv')
## Check for Number of Parks Increasing ##
# I actually check for the number of ranks increasing.
# But, without a lot of ties, the number of ranks and number of parks will be similar.
number_of_ranks_increase <- parks %>%
group_by(year) %>%
mutate(`Number of Ranks` = max(rank)) %>%
ggplot(aes(x=year, y=`Number of Ranks`)) +
geom_line() +
labs(title = "Number of Ranks Increases Between 2012 and 2016")
# This plot shows up at the top
## Copying Jesse's Plot ##
# Borrowing Jesse's list of Texas cities
texas_cities <- c("Dallas", "Austin", "Arlington, Texas", "Fort Worth",
"El Paso", "Houston", "San Antonio")
raw_ranks <- normalized_rank_data %>%
filter(city %in% texas_cities) %>%
ggplot(aes(year, rank, color = city)) +
geom_point(size = 3) +
geom_bump() +
theme_minimal() +
scale_y_reverse(name = "Rank") +
# I add a 'title' to her plot so I can tell it apart from the others in my final image
labs(title = "Raw Ranks Decrease with Time")
# This plot shows up at the bottom left
## Solve for a Changing Number of Ranks ##
# There are lot of ways to try to adjust when the number of ranks changes over time.
# I chose to just rescale the rank data between 0 and 1.
# I do that on a year by year basis (`group_by(year)`).
# If I rescale straight up,
# then the worst park in a given year would have a score of 1
# and the best park would be a 0.
# I didn't want that.
# I like it when my low numbers are bad and high numbers are good.
normalized_rank_data <- parks %>%
group_by(year) %>%
mutate(relative_rank = rescale(-rank, to =c(0, 1))) %>%
select(year, rank, relative_rank, city, total_points)
# This is kind of similar to a 'percentile' so I decided to format the data that way
relative_ranks <- normalized_rank_data %>%
filter(city %in% texas_cities) %>%
ggplot(aes(year, relative_rank, color = city)) +
geom_point(size = 3) +
geom_bump() +
theme_minimal() +
labs(title = "Relative Ranks are Stable Across Time") +
scale_y_continuous(labels=percent, name = "Relative Rank")
# This plot shows up at the bottom right
## Display Plots ##
number_of_ranks_increase / {raw_ranks + relative_ranks}
## Summary ##
# The number of ranked parks increase between 2012 and 2016.
# This leads to a drop in the rank of the parks we looked at in Texas in the initial post.
# Rescaling the rank per year appears to indicate that Texas parks aren't getting worse.
# Texas parks looking okay y'all. :)
@russellpierce
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment