ramhiser’s gists

ramhiser / pandas-fixed.py

Last active August 29, 2015 14:15

Reindexing Pandas DataFrame with MultiIindex.from_product triggers missing values

	import pandas as pd

	df = pd.DataFrame([['01-02-2015', 'a', 17],
	['01-09-2015', 'a', 42],
	['01-30-2015', 'a', 19],
	['01-02-2015', 'b', 23],
	['01-23-2015', 'b', 1],
	['01-30-2015', 'b', 13]])
	df.columns = ['date', 'group', 'response']
	df.set_index(['date', 'group'], inplace=True)

ramhiser / huber.py

Created January 21, 2015 17:39

Robust Estimation of Mean and Standard Deviation in Python via the Huber Estimator

	import numpy as np
	from statsmodels.robust.scale import huber

	# Mean and standard deviation to generate normal random variates
	mean, std_dev = 0, 2
	sample_size = 25
	np.random.seed(42)
	x = np.random.normal(mean, std_dev, sample_size)

	# Appends a couple of outliers

ramhiser / cut-pretty.r

Last active August 29, 2015 14:08

Cuts a vector into factors with pretty levels

	#' Cuts a vector into factors with pretty levels
	#'
	#' @param x numeric vectory
	#' @param breaks numeric vector of two ore more unique cut points
	#' @param collapse character string to collapse factor labels
	#' @param ... arguments passed to \code{\link[base]{cut}}
	#' @return A \code{\link{factor}} is returned
	#'
	#' @examples
	#' set.seed(42)

ramhiser / random-forest.r

Created October 22, 2014 21:57

Plots Variable Importance from Random Forest in R

	library(randomForest)
	library(dplyr)
	library(ggplot2)

	set.seed(42)

	rf_out <- randomForest(Species ~ ., data=iris)

	# Extracts variable importance (Mean Decrease in Gini Index)
	# Sorts by variable importance and relevels factors to match ordering

ramhiser / object-sizes.r

Created September 29, 2014 17:29

Object Size of R Objects in Memory

	library(dplyr)

	objects <- ls()
	object_sizes <- sapply(objects, function(x) object.size(get(x)))
	object_sizes <- data.frame(objects, object_sizes, row.names=NULL)
	object_sizes$units_MB <- utils:::format.object_size(object_sizes$object_sizes, units="Mb")
	dplyr::arrange(object_sizes, object_sizes)

ramhiser / gompertz.R

Created July 7, 2014 15:59

Exponentially weighting Bernoulli trials for a Beta prior

	# For the standard conjugate beta prior for a binomial likelihood, a typical
	# approach is to weight each prior observation equally, there are times where
	# the prior Bernoulli trials should be weighted over time, so that the more
	# recent trials are weighted near 1 and the oldest trials should be weighted
	# near 0.

	# Gompertz Function
	# http://en.wikipedia.org/wiki/Gompertz_function
	gompertz <- function(x, a=1, b=1, c=1) {
	a * exp(-b * exp(-c * x))

ramhiser / jaccard.py

Last active November 4, 2021 08:41

Jaccard cluster similarity in Python

	import itertools

	def jaccard(labels1, labels2):
	"""
	Computes the Jaccard similarity between two sets of clustering labels.

	The value returned is between 0 and 1, inclusively. A value of 1 indicates
	perfect agreement between two clustering algorithms, whereas a value of 0
	indicates no agreement. For details on the Jaccard index, see:
	http://en.wikipedia.org/wiki/Jaccard_index

ramhiser / adjacency-matrix.r

Created May 16, 2014 21:28

Create adjacency matrix from cluster labels

	library(clusteval)
	adjacency_matrix <- function(cluster_labels, names=NULL, force_symmetric=FALSE) {
	adj_matrix <- diag(length(cluster_labels))
	adj_matrix[lower.tri(adj_matrix)] <- clusteval::comembership(cluster_labels)
	if (force_symmetric) {
	adj_matrix <- adj_matrix + t(adj_matrix)
	}
	diag(adj_matrix) <- 0
	if (!is.null(names))
	rownames(adj_matrix) <- colnames(adj_matrix) <- names

ramhiser / updated-mda-code.r

Created May 8, 2014 19:59

Updated MDA code for Maha Wael Elbakry

	# Comment thread beginning here:
	# http://ramhiser.com/blog/2013/07/02/a-brief-look-at-mixture-discriminant-analysis/#comment-1374749931
	#
	# I'm using version 0.4-4 of the `mda` package
	library(mda)
	test_data <- read.csv("ts2.csv")
	colnames(test_data)[16] <- "filter"
	mda_out <- mda(formula=fol_up_u ~ . - filter,
	data=test_data,
	CV=TRUE,

ramhiser / try_backoff.r

Last active August 2, 2022 15:11

Try/catch in R with exponential backoff

	#' Try/catch with exponential backoff
	#'
	#' Attempts the expression in \code{expr} up to the number of tries specified in
	#' \code{max_attempts}. Each time a failure results, the functions sleeps for a
	#' random amount of time before re-attempting the expression. The upper bound of
	#' the backoff increases exponentially after each failure.
	#'
	#' For details on exponential backoff, see:
	#' \url{http://en.wikipedia.org/wiki/Exponential_backoff}
	#'

John Ramey ramhiser