Skip to content

Instantly share code, notes, and snippets.

@ramhiser
ramhiser / pandas-fixed.py
Last active August 29, 2015 14:15
Reindexing Pandas DataFrame with MultiIindex.from_product triggers missing values
import pandas as pd
df = pd.DataFrame([['01-02-2015', 'a', 17],
['01-09-2015', 'a', 42],
['01-30-2015', 'a', 19],
['01-02-2015', 'b', 23],
['01-23-2015', 'b', 1],
['01-30-2015', 'b', 13]])
df.columns = ['date', 'group', 'response']
df.set_index(['date', 'group'], inplace=True)
@ramhiser
ramhiser / huber.py
Created January 21, 2015 17:39
Robust Estimation of Mean and Standard Deviation in Python via the Huber Estimator
import numpy as np
from statsmodels.robust.scale import huber
# Mean and standard deviation to generate normal random variates
mean, std_dev = 0, 2
sample_size = 25
np.random.seed(42)
x = np.random.normal(mean, std_dev, sample_size)
# Appends a couple of outliers
@ramhiser
ramhiser / cut-pretty.r
Last active August 29, 2015 14:08
Cuts a vector into factors with pretty levels
#' Cuts a vector into factors with pretty levels
#'
#' @param x numeric vectory
#' @param breaks numeric vector of two ore more unique cut points
#' @param collapse character string to collapse factor labels
#' @param ... arguments passed to \code{\link[base]{cut}}
#' @return A \code{\link{factor}} is returned
#'
#' @examples
#' set.seed(42)
@ramhiser
ramhiser / random-forest.r
Created October 22, 2014 21:57
Plots Variable Importance from Random Forest in R
library(randomForest)
library(dplyr)
library(ggplot2)
set.seed(42)
rf_out <- randomForest(Species ~ ., data=iris)
# Extracts variable importance (Mean Decrease in Gini Index)
# Sorts by variable importance and relevels factors to match ordering
@ramhiser
ramhiser / object-sizes.r
Created September 29, 2014 17:29
Object Size of R Objects in Memory
library(dplyr)
objects <- ls()
object_sizes <- sapply(objects, function(x) object.size(get(x)))
object_sizes <- data.frame(objects, object_sizes, row.names=NULL)
object_sizes$units_MB <- utils:::format.object_size(object_sizes$object_sizes, units="Mb")
dplyr::arrange(object_sizes, object_sizes)
@ramhiser
ramhiser / gompertz.R
Created July 7, 2014 15:59
Exponentially weighting Bernoulli trials for a Beta prior
# For the standard conjugate beta prior for a binomial likelihood, a typical
# approach is to weight each prior observation equally, there are times where
# the prior Bernoulli trials should be weighted over time, so that the more
# recent trials are weighted near 1 and the oldest trials should be weighted
# near 0.
# Gompertz Function
# http://en.wikipedia.org/wiki/Gompertz_function
gompertz <- function(x, a=1, b=1, c=1) {
a * exp(-b * exp(-c * x))
@ramhiser
ramhiser / jaccard.py
Last active November 4, 2021 08:41
Jaccard cluster similarity in Python
import itertools
def jaccard(labels1, labels2):
"""
Computes the Jaccard similarity between two sets of clustering labels.
The value returned is between 0 and 1, inclusively. A value of 1 indicates
perfect agreement between two clustering algorithms, whereas a value of 0
indicates no agreement. For details on the Jaccard index, see:
http://en.wikipedia.org/wiki/Jaccard_index
@ramhiser
ramhiser / adjacency-matrix.r
Created May 16, 2014 21:28
Create adjacency matrix from cluster labels
library(clusteval)
adjacency_matrix <- function(cluster_labels, names=NULL, force_symmetric=FALSE) {
adj_matrix <- diag(length(cluster_labels))
adj_matrix[lower.tri(adj_matrix)] <- clusteval::comembership(cluster_labels)
if (force_symmetric) {
adj_matrix <- adj_matrix + t(adj_matrix)
}
diag(adj_matrix) <- 0
if (!is.null(names))
rownames(adj_matrix) <- colnames(adj_matrix) <- names
@ramhiser
ramhiser / updated-mda-code.r
Created May 8, 2014 19:59
Updated MDA code for Maha Wael Elbakry
# Comment thread beginning here:
# http://ramhiser.com/blog/2013/07/02/a-brief-look-at-mixture-discriminant-analysis/#comment-1374749931
#
# I'm using version 0.4-4 of the `mda` package
library(mda)
test_data <- read.csv("ts2.csv")
colnames(test_data)[16] <- "filter"
mda_out <- mda(formula=fol_up_u ~ . - filter,
data=test_data,
CV=TRUE,
@ramhiser
ramhiser / try_backoff.r
Last active August 2, 2022 15:11
Try/catch in R with exponential backoff
#' Try/catch with exponential backoff
#'
#' Attempts the expression in \code{expr} up to the number of tries specified in
#' \code{max_attempts}. Each time a failure results, the functions sleeps for a
#' random amount of time before re-attempting the expression. The upper bound of
#' the backoff increases exponentially after each failure.
#'
#' For details on exponential backoff, see:
#' \url{http://en.wikipedia.org/wiki/Exponential_backoff}
#'