Skip to content

Instantly share code, notes, and snippets.

@dylanjm
Created April 21, 2018 15:04
Show Gist options
  • Save dylanjm/474ab30a18a96e19c139d19801f0acc2 to your computer and use it in GitHub Desktop.
Save dylanjm/474ab30a18a96e19c139d19801f0acc2 to your computer and use it in GitHub Desktop.
A script that uses purrr to automate the wrangling and cleaning of economic data
library(tidyverse)
library(rio)
library(rvest)
library(janitor)
# Rcode to go and fetch country codes
country_codes <- read_html("http://web.stanford.edu/~chadj/countrycodes6.3") %>%
html_text() %>%
str_extract_all("[A-Z]{3}") %>%
unlist() %>%
.[order(.)]
# Save the country codes to a .csv file that we will always have
country_codes %>%
{tibble(codes = .)} %>%
write_csv(path = here::here("data/country_codes.csv"))
# create a little function we want to utilize in our map()
read_and_clean <- function(country_code = "USA"){
dat_url <- paste0("http://www.stanford.edu/~chadj/snapshots/", country_code, ".xls")
import(dat_url, skip = 9) %>%
clean_names() %>%
na.omit() %>%
filter(population != "NaN") %>%
mutate_all(as.numeric) %>%
mutate(country = country_code)
}
# We want the function to run despite running into errors
possibly_read_and_clean <- possibly(read_and_clean, otherwise = NULL)
# Remove all our NULLS and turn it to one large data.frame
final_clean_dat <- purrr::map(country_codes, ~ possibly_read_and_clean(.x)) %>%
compact() %>%
map_df(bind_rows)
# Save and store this data.frame
write_csv(final_clean_dat, here::here("data/final_gdp_data.csv"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment