This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# copied from original "load_data.R" program created for 2016 data | |
# primary change was that there are more data fields in 2017 | |
library(data.table) | |
library(magrittr) | |
library(readr) | |
library(fst) | |
# load core data -------------------------------------------------------------------- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# for reading fixed with files, which are files with no delimiter (see readr package and read_fwf()) | |
# col_widths is a vector of column widths (e.g., c(8, 4, 2, 9)) | |
# input file is a character string with the input file (e.g., "./data/read.txt") | |
# on 300 MB file with 143 columns timings on 2018 Macbook pro were as follows: | |
# read_fwf from readr package: 10.8 sec | |
# non-parallel use of gawk: 10.5 sec | |
# parallel use of gawk: 4.4 sec (below function) | |
flat_fread <- function(col_widths, input_file){ | |
col_spec <- paste0(widths, collapse = " ") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# this loads the 2016 NIS fixed width (asc) files into R | |
# it also saves the result as an fst file for much faster re-reading into R | |
library(data.table) | |
library(readr) | |
library(fst) | |
# load core data -------------------------------------------------------------------- | |
nis_specs <- fread("./docs/nis_specs_core.csv") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# based on https://cran.r-project.org/web/packages/survival/vignettes/splines.pdf from Terry Therneau | |
# start with termplot without the plot to return results for all coefficients in the model | |
# y is the object in which the coxph model results have been saved | |
d <- termplot(y, se = TRUE, plot = FALSE) | |
# takes the termplot object (tp_obj), a specific variable name from the model as a string (var_name), and outputs a plot |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# load libraries -------------------------------------------------------------------- | |
library(survival) | |
library(data.table) | |
library(magrittr) | |
library(ggplot2) | |
options(stringsAsFactors = FALSE, scipen = 10) | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# load libraries -------------------------------------------------------------------- | |
library(data.table) | |
library(feather) | |
# US Part D Drug prices 2013: 500 MB zip, 2.9 GB uncompressed ----------------------- | |
pde_link <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip" | |
tf <- tempfile() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# main web page: "https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DESample01.html" | |
# list of files to be downloaded for each 1/20 of the data | |
dl_list <- | |
c( | |
"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_Beneficiary_Summary_File_Samplezzz.zip", | |
"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Carrier_Claims_SamplezzzA.zip", | |
"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Carrier_Claims_SamplezzzB.zip", | |
"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_to_2010_Inpatient_Claims_Samplezzz.zip", | |
"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_to_2010_Outpatient_Claims_Samplezzz.zip", | |
"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Prescription_Drug_Events_Samplezzz.zip", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# ---------- US Part D Drug prices 2013 ---------- # | |
library(data.table) | |
library(magrittr) | |
# data from http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip | |
# 500 MB ZIP file download, 2.9 GB uncompressed | |
pde <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip" | |
tf <- tempfile() | |
download.file(pde, tf) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(magrittr) | |
library(rvest) | |
library(xml2) | |
get_nhanes_listing <- function(){ | |
nhanes_url <- "http://wwwn.cdc.gov/Nchs/Nhanes/Search/DataPage.aspx" | |
tbl <- xml2::read_html(nhanes_url) | |
table_text <- | |
rvest::html_table(tbl) %>% | |
data.frame(stringsAsFactors = FALSE) # just gets table, not hyperlinks in table | |
names(table_text) <- gsub("\\.", "_", names(table_text)) %>% tolower() |