Skip to content

Instantly share code, notes, and snippets.

@tomhopper
Created October 29, 2017 21:51
Show Gist options
  • Save tomhopper/3f35b7e409fd289057cf7108a46ccbf0 to your computer and use it in GitHub Desktop.
Save tomhopper/3f35b7e409fd289057cf7108a46ccbf0 to your computer and use it in GitHub Desktop.
Web scrape and display top charities for hurricane relief.
## Top Charities for Hurricane Harvey Relief
## According to both Charity Navigator and Charity Watch
## Approach:
## Scrape data from Charity Navigator and Charity Watch.
## Merge and display the intersection (common entries) of
## the two data sets.
## ** BROKEN ** As of 2017-10-29, Charity Navigator has changed their page
## and the organization of the table of charities.
## Libraries ####
library(rvest) # Web scraping
library(dplyr) # Data wrangling
## Download and clean the data ####
cn_url <- "https://www.charitynavigator.org/index.cfm?bay=content.view&cpid=5356&from=homepage"
cw_url <- "https://www.charitywatch.org/charitywatch-hot-topic/hurricane-maria-relief/81"
cn_df <- read_html(cn_url) %>%
html_node(xpath = '//*[@id="list-right"]/table') %>%
html_table() %>%
setNames(c("Charity", "Rating")) %>%
arrange(Charity)
cw_df <- read_html(cw_url) %>%
html_node(xpath = '//*[@id="main_wrapper"]/div/table') %>%
html_table() %>%
setNames(c("Charity", "Rating")) %>%
arrange(Charity)
## Manually fix mismatched charity names
rename_vec <- c(`Direct Relief & Direct Relief Foundation` = "Direct Relief")
cw_df$Charity[cw_df$Charity == names(rename_vec)] <- na.omit(rename_vec[cw_df$Charity])
## Display intersection of results ####
cn_df %>% inner_join(cw_df, by = "Charity") %>% select(Charity)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment