tomhopper’s gists

tomhopper / Top_Charities_Hurricane_X.R

Created October 29, 2017 21:51

Web scrape and display top charities for hurricane relief.

	## Top Charities for Hurricane Harvey Relief
	## According to both Charity Navigator and Charity Watch
	## Approach:
	## Scrape data from Charity Navigator and Charity Watch.
	## Merge and display the intersection (common entries) of
	## the two data sets.
	## BROKEN As of 2017-10-29, Charity Navigator has changed their page
	## and the organization of the table of charities.

	## Libraries ####

tomhopper / SOCR_Data_25000_Human_Height_Weight.R

Created October 29, 2017 21:45

SOCR Data - 25,000 Records of Human Heights (in) and Weights (lbs)

	## Height and Weight of 18 year olds
	## from Hong Kong 1993 Growth Survey data,
	## simulated by SOCR from reported summary statistics
	## Heights in inches
	## Weights in pounds
	## Explanation \url{http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights}
	## Data \url{http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_HeightsWeights.html}

	## Libraries ####
	library(rvest) # Web scraping

tomhopper / Child_Height_Weight_Stats.R

Created October 29, 2017 12:07

Growth chart summary statistics for Hong Kong children, ages 6 to 18, for 1963, 1993, 2005/6

	## Download growth chart summary statistics for Hong Kong children, ages 6 to 18, for 1963, 1993, 2005/6
	## Data from
	## So, Hung-Kwan et al. “Secular Changes in Height, Weight and Body Mass Index in Hong Kong Children.” BMC Public Health 8 (2008): 320. PMC. Web. 29 Oct. 2017.
	## Article at \url{https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2572616/}
	## PMC Copyright and reuse terms: \url{https://www.ncbi.nlm.nih.gov/pmc/about/copyright/}
	## Heights in cm
	## Weights in kg

	## Libraries ####
	library(rvest)

tomhopper / model_fit_stats.R

Created June 19, 2017 23:39

Accepts one or more lm objects and returns a single data frame containing the fit statistics R-squared, adjusted R-squared, predictive R-squared and PRESS for each model.

	#' Model Fit Statistics
	#' @description Returns lm model fit statistics R-squared, adjusted R-squared,
	#' predicted R-squared and PRESS.
	#' Thanks to John Mount for his 6-June-2014 blog post, R style tip: prefer functions that return data frames" for
	#' the idea \url{http://www.win-vector.com/blog/2014/06/r-style-tip-prefer-functions-that-return-data-frames}
	#' @param ... One or more \code{lm()} models.
	#' @return A data frame with rows for R-squared, adjusted R-squared, Predictive R-squared and PRESS statistics, and a column for each model passed to the function.
	model_fit_stats <- function(...) {
	var_names <- as.character(match.call())[-1]
	dots <- list(...)

tomhopper / strip_na_rows.R

Created June 19, 2017 23:38

Strips rows containing only NA values from a supplied data frame, and returns the resulting data frame.

	#' Remove rows from data frame containing only NA in pipe-friendly manner
	#' @description Accepts a data frame and strips out any rows
	#' containing only \code{NA} values, then returns the resulting data frame.
	#' @param A data frame
	#' @return A data frame
	#' @source \url{http://stackoverflow.com/a/6437778}
	strip_na_rows <- function(the_df) {
	the_df[rowSums(is.na(the_df)) != ncol(the_df),]
	return(the_df)
	}

tomhopper / dplyr_cumsum_column.R

Last active August 7, 2020 17:39

Add a cumulative sum column to a data frame using dplyr

	df %>%
	group_by(id) %>%
	mutate(cumsum = cumsum(value)) %>%
	ungroup()

	# from \url{http://stackoverflow.com/a/21818500/393354}

tomhopper / addNewData.R

Created October 9, 2016 00:12 — forked from dfalster/addNewData.R

The function addNewData.R modifies a data frame with a lookup table. This is useful where you want to supplement data loaded from file with other data, e.g. to add details, change treatment names, or similar. The function readNewData is also included. This function runs some checks on the new table to ensure it has correct variable names and val…

	##' Modifies 'data' by adding new values supplied in newDataFileName
	##'
	##' newDataFileName is expected to have columns
	##' c(lookupVariable,lookupValue,newVariable,newValue,source)
	##'
	##' Within the column 'newVariable', replace values that
	##' match 'lookupValue' within column 'lookupVariable' with the value
	##' newValue'. If 'lookupVariable' is NA, then replace all elements
	##' of 'newVariable' with the value 'newValue'.
	##'

tomhopper / nlsLM_s-curve.R

Last active August 29, 2023 16:46

Example of using nlsLM to fit an s-curve to data

	# Based on a post at \url{http://www.walkingrandomly.com/?p=5254}
	library(dplyr)
	library(ggplot2)
	library(minpack.lm)

	# The data to fit
	my_df <- data_frame(x = c(0,15,45,75,105,135,165,195,225,255,285,315),
	y = c(0,0,0,4.5,19.7,39.5,59.2,77.1,93.6,98.7,100,100))

	# EDA to see the trend

tomhopper / median_hourly_earnings.R

Created July 2, 2016 15:35

makeover: convert from two groups of side-by-side vertical bar charts to a more readable dot plot

	# from Conrad Hacket
	# Median hourly earnings
	# \url{https://twitter.com/conradhackett/status/748884076493475840}
	# makeover: convert from two groups of side-by-side vertical bar charts to a more readable dot plot
	# Demonstrates:
	# Use of in ggplot2
	# Creating dot plots
	# Combining color and shape in a single legend
	# Sorting a dataframe so that categorical data in one column is ordered by a second numerical column
	# Note: resulting graph displays best at about 450 pixels x 150 pixels

tomhopper / dplyr_filter_ungroup.R

Created January 29, 2016 20:31 — forked from jhofman/dplyr_filter_ungroup.R

careful when filtering with many groups in dplyr

	library(dplyr)

	# create a dummy dataframe with 100,000 groups and 1,000,000 rows
	# and partition by group_id
	df <- data.frame(group_id=sample(1:1e5, 1e6, replace=T),
	val=sample(1:100, 1e6, replace=T)) %>%
	group_by(group_id)

	# filter rows with a value of 1 naively
	system.time(df %>% filter(val == 1))

Tom Hopper tomhopper