Last active
April 8, 2020 15:04
-
-
Save andrewbtran/690580868b99f717a677eab5d4a0f471 to your computer and use it in GitHub Desktop.
coronavirus county and regional disparity analysis
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Coronavirus racial disparities" | |
author: "Andrew Ba Tran and Reis Thebault" | |
date: "4/7/2020" | |
output: | |
html_document: | |
code_folding: hide | |
--- | |
Counties where Black residents are the majority have three times the rate of median cases of coronavirus cases and have almost four times the average rate of deaths of counties. | |
```{r setup, wanring=F, message=F} | |
# This function checks if you don't have the correct packages installed yet | |
# If not, it will install it for you | |
packages <- c("tidyverse", "tidycensus", "ggrepel", | |
"knitr", "DT", "tibble") | |
if (length(setdiff(packages, rownames(installed.packages()))) > 0) { | |
install.packages(setdiff(packages, rownames(installed.packages())), repos = "https://cran.us.r-project.org") | |
} | |
# Loading libraries | |
library(tidyverse) | |
library(tidycensus) | |
library(knitr) | |
library(ggrepel) | |
library(DT) | |
library(tibble) | |
# Load your own Census API key in line 31 | |
# https://api.census.gov/data/key_signup.html | |
census_key <- "XXXXXXX" | |
census_api_key(census_key) | |
``` | |
```{r race_table, warning=F, message=F} | |
# Replace the date with the current one | |
what_day_is_it <- "04-07-2020" | |
# Bring data in data from JHU | |
corona <- read_csv(paste0("https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/", what_day_is_it, ".csv")) | |
# Downloading race data from the Census | |
# Calculating the percent for each county | |
county_race <- get_acs(geography = "county", | |
variables = c("B03002_003", | |
"B03002_004", | |
"B03002_006", | |
"B03002_012", | |
"B03002_007"), | |
summary_var = "B03002_001") %>% | |
mutate(pct = round(100 * (estimate/summary_est),2)) %>% | |
mutate(race=case_when( | |
variable=="B03002_003" ~"White", | |
variable=="B03002_004" ~"Black", | |
variable=="B03002_006" ~"Asian", | |
variable=="B03002_012" ~"Hispanic", | |
variable=="B03002_007" ~"American Indian", | |
TRUE ~ "Other" | |
)) | |
# narrowing down the number of columns | |
# to determine which race is the majority in the county | |
county_race_wide <- county_race %>% | |
ungroup() %>% | |
select(GEOID, summary_est, pct, race) %>% | |
group_by(GEOID) %>% | |
arrange(desc(pct)) %>% | |
slice(1) | |
# after we've categorized each county to figure out which is the race-majority | |
# we join it back to the original dataframe | |
# and calculate per capita for cases and deaths | |
corona_race_join <- corona %>% | |
left_join(county_race_wide, by=c("FIPS"="GEOID")) %>% | |
mutate(cases_per_capita=Confirmed/summary_est*100000, | |
deaths_per_capita=Deaths/summary_est*100000) %>% | |
filter(!is.na(race)) | |
# Calculating median and average cases/deaths per capita by county group | |
# This excludes counties with no cases and no deaths | |
corona_race_table <- corona_race_join %>% | |
group_by(race) %>% | |
summarize(counties=n(), | |
median_cases_per_capita=round(median(cases_per_capita, na.rm=T),2), | |
median_deaths_per_capita=round(median(deaths_per_capita, na.rm=T),2), | |
average_cases_per_capita=round(mean(cases_per_capita, na.rm=T),2), | |
average_deaths_per_capita=round(mean(deaths_per_capita, na.rm=T),2)) | |
# Cleaning up the column names | |
corona_race_table %>% | |
rename(`County majority`=race, Counties=counties, | |
`Median cases per 100k`=median_cases_per_capita, | |
`Median deaths per 100k`=median_deaths_per_capita, | |
`Average cases per 100k`=average_cases_per_capita, | |
`Average deaths per 100k`=average_deaths_per_capita) %>% | |
kable() | |
``` | |
This analysis is based on results from local jurisdictions and Census demographics data because little more than a dozen counties and states combined have released racial data of confirmed cases and deaths months after the first case was discovered in the U.S. | |
Since then some officials have sounded the alarm that Black communities are being disproportionately affected. | |
The Post found 13 jurisdictions offering numbers broken out by race: Four states, DC, and eight counties, most of them in Florida. [**Note:** Numbers in these paragraphs are already out of date] | |
Nine out of 13 jurisdictions that the Post has found releasing racial breakdowns have a large gap between the percent of Black residents who've died from coronavirus compared to the percent of the population there. | |
```{r yes, warning=F, message=F, fig.width=9, fig.height=7.5} | |
# This data is as of early April 7, 2020 | |
# You'll want to bring in your own data | |
# or update the tibble below on your own | |
regions <- tribble( | |
~Region, ~Percent_of_population, ~Percent_of_deaths, | |
"Milwaukee County, Wis.", 26, 73, | |
"Louisiana", 32, 70, | |
"D.C.", 46, 58, | |
"Chicago", 32, 67, | |
"Illinois", 14, 42, | |
"Michigan", 14, 41, | |
"North Carolina", 21, 38, | |
"Florida", 15, 16, | |
"Connecticut", 10, 16 | |
) | |
black_pop <- regions %>% | |
select(Region, percent=Percent_of_population) %>% | |
mutate(type="% of population") | |
black_deaths <- regions %>% | |
select(Region, percent=Percent_of_deaths) %>% | |
mutate(type="% of deaths") | |
slope <- rbind(black_pop, black_deaths) | |
slope$type <- factor(slope$type, levels = c("% of population", "% of deaths")) | |
ggplot() + | |
geom_line( | |
data = slope, | |
mapping = aes(y = percent, x = type, group = Region, color = Region), | |
size = 1, | |
alpha = 0.5 | |
) + | |
geom_text_repel( | |
data = subset(slope, type == "% of population"), | |
mapping = aes(y = percent, x = type, label = paste0(Region, " ", percent, "%")), | |
direction = "y", | |
nudge_x = -0.05, | |
hjust = 1, | |
point.padding = 0, | |
segment.size = 0.2 | |
) + | |
geom_text_repel( | |
data = subset(slope, type == "% of deaths"), | |
mapping = aes(y = percent, x = type, label = paste0(percent, "%")), | |
direction = "y", | |
hjust = 0, | |
nudge_x = 0.05, | |
point.padding = 0, | |
segment.size = 0.2 | |
) + | |
scale_color_discrete(guide = FALSE) + | |
theme_void() + | |
theme( | |
axis.text.x = element_text(size=12) | |
) + | |
labs(title="Black percent of population compared to share of coronavirus deaths") | |
``` | |
Chicago, Louisana, and Milawaukee County in Wisconsin have the largest disparity. | |
Black residents there make up little more than 30 percent of the population but consist of around 70 percent of those who died with coronavirus. | |
Florida and Illinois have the largest gap gaps-- nearly three times the population have died from the virus. | |
Nearly 41 percent of deaths in Michigan have been Black residents. | |
The racial breakdown is only released as a whole for the state-- in which 14 percent of the population is Black. But Michigan does release overall testing and deaths figures by county. | |
And Wayne County, where Detroit is located, is the largest county in state and has the highest cases/deaths per capita so far. | |
Their population is 38 percent Black, which would explain the 41 percent of Black deaths. But because Michigan does not release the racial breakdown by county, researchers can only calculate based on the paramaters of the data they're given to normalize with (14 percent). | |
Statewide, a big disparity also exists in Illinois, likely fueled by that gulf in Cook County. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment