ramhiser’s gists

ramhiser / boundieboxes.py

Created July 18, 2015 22:16

Boundieboxes from Pierce (OpenCV 3.0)

	import numpy as np
	import matplotlib.pyplot as plt
	import matplotlib.patches as mpatch
	import matplotlib.cm as cm
	import cv2
	import csv
	from sklearn import cluster


	def find_points(gray_img, color_img, num_points):

ramhiser / exercise-4.7.r

Created July 4, 2015 03:45

Exercise 4.7 (p.160) from Box, Hunter and Hunter (2005) "Statistics for Experimenters"

	library(dplyr)
	library(BHH2)

	df <- expand.grid(drivers=c('I', 'II', 'III', 'IV'),
	cars=1:4)
	df <- rbind(df, df) %>% arrange(drivers, cars)
	df$treatment <- c(
	rep(c('A', 'B', 'D', 'C'), each=2),
	rep(c('D', 'C', 'A', 'B'), each=2),
	rep(c('B', 'D', 'C', 'A'), each=2),

ramhiser / leaflet-county-explorer.r

Created May 4, 2015 18:25

Leaflet app in R to explore U.S. Census demographics by county

	# TODO: Add a Shiny dropdown to select demographic variable
	library(leaflet)
	library(noncensus)
	library(dplyr)

	data("counties", package="noncensus")
	data("county_polygons", package="noncensus")
	data("quick_facts", package="noncensus")

	counties <- counties %>%

ramhiser / add-na-rows.r

Last active August 29, 2015 14:20

Add NA row for each group in data frame

	# Useful for drawing polygons with leaflet
	# Polygons are stored in a `tbl_df` object with a mandatory `NA` row between each
	# polygon so that `leaflet` knows to stop drawing between each polygon.
	# Rather than magic, I found a slick way to do this via `dplyr::arrange`
	# See: (http://stackoverflow.com/a/25267681/234233).

	# Example using Iris data set:
	df_na <- matrix(NA, nrow=nlevels(iris$Species), ncol=ncol(iris) - 1)
	df_na <- tbl_df(as.data.frame(df_na))
	colnames(df_na) <- setdiff(colnames(iris), "Species")

ramhiser / austin-thd-stores.html

Created April 21, 2015 18:08

Basic Leaflet Example - Home Depot Stores in Austin Area


	<!DOCTYPE html>
	<html>
	<head>
	<title>Leaflet Example -- Home Depot Stores</title>
	<meta charset="utf-8" />

	<meta name="viewport" content="width=device-width, initial-scale=1.0">

	<link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.7.3/leaflet.css" />

ramhiser / impute-naive.r

Created March 27, 2015 18:27

Naive imputation of missing data within an R data frame

	#' Naive imputation of missing data
	#'
	#' Imputes missing data in a data frame a column at a time, e.g., univariate.
	#' Missing numeric values are replaced with the median. Similarly, missing
	#' factor values are replaced with the mode.
	#'
	#' If \code{draw} is set to \code{TRUE}, missing data are drawn from a basic
	#' distribution to make the imputation slightly less naive. For continuous,
	#' values are drawn from a uniform distribution ranging from the min to max
	#' values observed within the column. For categorical, values are drawn from a

ramhiser / schools.r

Created February 15, 2015 04:41

Exercise 5.9a from Gelman BDA3

	# SAT scores data from Table 5.2 on page 120 of Gelman's BDA3 text
	y <- c(28, 8, -3, 7, -1, 1, 18, 12)
	sigma <- c(15, 10, 16, 11, 9, 11, 10, 18)

	# Goal: Replicate calculations in Section 5.5
	# Instructions for posterior simulation given on page 118
	library(itertools2)

	# Equation 5.21 on page 117
	tau_posterior <- function(tau, y, sigma) {

ramhiser / one-hot.py

Last active April 7, 2021 06:44

Apply one-hot encoding to a pandas DataFrame

	import pandas as pd
	import numpy as np
	from sklearn.feature_extraction import DictVectorizer

	def encode_onehot(df, cols):
	"""
	One-hot encoding is applied to columns specified in a pandas DataFrame.

	Modified from: https://gist.github.com/kljensen/5452382

ramhiser / fill-product.py

Created February 10, 2015 02:55

Fills a DataFrame with the Cartesian product of the given indices.

	# Goal: Fill missing date/group pairs with fill_value using Cartesian product of indices
	import pandas as pd

	def fill_product(df, index, fill_value=0):
	"""
	Fills a DataFrame with the Cartesian product of the given indices.

	See: http://stackoverflow.com/a/16994910/234233

	Example:

ramhiser / date-range.py

Last active July 5, 2022 10:55

Python generator to construct range of dates

	from datetime import datetime, timedelta

	def date_range(start, end, step=7, date_format="%m-%d-%Y"):
	"""
	Creates generator with a range of dates.
	The dates occur every 7th day (default).

	:param start: the start date of the date range
	:param end: the end date of the date range
	:param step: the step size of the dates

John Ramey ramhiser