markdanese’s gists

markdanese / load_nis_data_2017.R

Created September 10, 2021 01:24

NIS 2017 load script

	# copied from original "load_data.R" program created for 2016 data
	# primary change was that there are more data fields in 2017

	library(data.table)
	library(magrittr)
	library(readr)
	library(fst)


	# load core data --------------------------------------------------------------------

markdanese / flat_fread.R

Last active January 9, 2024 09:23

data.table fread fixed width file reader

	# for reading fixed with files, which are files with no delimiter (see readr package and read_fwf())
	# col_widths is a vector of column widths (e.g., c(8, 4, 2, 9))
	# input file is a character string with the input file (e.g., "./data/read.txt")
	# on 300 MB file with 143 columns timings on 2018 Macbook pro were as follows:
	# read_fwf from readr package: 10.8 sec
	# non-parallel use of gawk: 10.5 sec
	# parallel use of gawk: 4.4 sec (below function)

	flat_fread <- function(col_widths, input_file){
	col_spec <- paste0(widths, collapse = " ")

markdanese / nis2016_hospital_read.R

Last active March 11, 2025 17:32

National Inpatient Sample (NIS) read program

	# this loads the 2016 NIS fixed width (asc) files into R
	# it also saves the result as an fst file for much faster re-reading into R

	library(data.table)
	library(readr)
	library(fst)

	# load core data --------------------------------------------------------------------

	nis_specs <- fread("./docs/nis_specs_core.csv")

markdanese / termplot_coxph.R

Last active December 8, 2017 19:13

Plot spline based coefficients from a coxph model from the survival package in R

	# based on https://cran.r-project.org/web/packages/survival/vignettes/splines.pdf from Terry Therneau

	# start with termplot without the plot to return results for all coefficients in the model

	# y is the object in which the coxph model results have been saved

	d <- termplot(y, se = TRUE, plot = FALSE)


	# takes the termplot object (tp_obj), a specific variable name from the model as a string (var_name), and outputs a plot

markdanese / adjusted_survival.R

Last active June 15, 2017 01:36

Simple approach to generating adjusted survival curves

	# load libraries --------------------------------------------------------------------

	library(survival)
	library(data.table)
	library(magrittr)
	library(ggplot2)

	options(stringsAsFactors = FALSE, scipen = 10)

markdanese / feather_test.R

Last active April 22, 2016 11:35

A test of the new feather package in R using Medicare Part D drug reimbursement data


	# load libraries --------------------------------------------------------------------

	library(data.table)
	library(feather)

	# US Part D Drug prices 2013: 500 MB zip, 2.9 GB uncompressed -----------------------

	pde_link <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip"
	tf <- tempfile()

markdanese / download_synpuf.R

Last active January 14, 2021 04:32

An R script to download the Medicare SynPUF (synthetic public use files)

	# main web page: "https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DESample01.html"
	# list of files to be downloaded for each 1/20 of the data
	dl_list <-
	c(
	"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_Beneficiary_Summary_File_Samplezzz.zip",
	"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Carrier_Claims_SamplezzzA.zip",
	"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Carrier_Claims_SamplezzzB.zip",
	"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_to_2010_Inpatient_Claims_Samplezzz.zip",
	"https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/DE1_0_2008_to_2010_Outpatient_Claims_Samplezzz.zip",
	"http://downloads.cms.gov/files/DE1_0_2008_to_2010_Prescription_Drug_Events_Samplezzz.zip",

markdanese / part_d.R

Last active October 8, 2015 04:20

script to read in Medicare Part D Prescriber data for 2013

	# ---------- US Part D Drug prices 2013 ---------- #
	library(data.table)
	library(magrittr)

	# data from http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip
	# 500 MB ZIP file download, 2.9 GB uncompressed

	pde <- "http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/PartD_Prescriber_PUF_NPI_DRUG_13.zip"
	tf <- tempfile()
	download.file(pde, tf)

markdanese / get_nhanes.R

Last active May 2, 2023 06:42

Scrape NHANES website and generate listing of all data (.xpt) and documentation (.htm) files

	library(magrittr)
	library(rvest)
	library(xml2)
	get_nhanes_listing <- function(){
	nhanes_url <- "http://wwwn.cdc.gov/Nchs/Nhanes/Search/DataPage.aspx"
	tbl <- xml2::read_html(nhanes_url)
	table_text <-
	rvest::html_table(tbl) %>%
	data.frame(stringsAsFactors = FALSE) # just gets table, not hyperlinks in table
	names(table_text) <- gsub("\\.", "_", names(table_text)) %>% tolower()

Mark Danese markdanese