Skip to content

Instantly share code, notes, and snippets.

@ramhiser
ramhiser / cutpoint.r
Created June 15, 2013 05:20
Mixture of Normal and Uniform to Construct 1D Gate
cut_mixture <- function(x, range = NULL, tol = 1e-6, maxit = 100) {
n <- length(x)
if (is.null(range)) {
range <- c(min(x), max(x))
}
# Initialization
# Initial value of pi1 are the proportion of x within 2 standard deviations of mu
huber_out <- huber(x)
@ramhiser
ramhiser / gist:5888640
Created June 28, 2013 22:24
Kolmogorov-Smirnov Test to Determine Nearest Sample
set.seed(42)
n <- 50
x1 <- rnorm(n, mean = 5)
x2 <- rnorm(n, mean = 0)
x3 <- rnorm(n, mean = 20)
plot(density(x1), xlim = c(-5, 25))
lines(density(x2), col = "red")
@ramhiser
ramhiser / cutpoint.r
Last active December 20, 2015 20:28
Cutpoint Method using Mixture of Two Normals
normalmix_loglike <- function(x, y) {
y <- as.factor(y)
x1 <- x[y == levels(y)[1]]
x2 <- x[y == levels(y)[2]]
n1 <- length(x1)
n2 <- length(x2)
n <- n1 + n2
w1 <- n1 / n
@ramhiser
ramhiser / download-espn-mlb-standings.py
Last active December 15, 2016 12:51
Python script to scrape ESPN for American League standings by team.
# The following script scrapes ESPN's MLB Standings Grid and writes the
# standings for each American League (AL) team to a CSV file, which has the following
# format:
# Team, Opponent, Wins, Losses
from bs4 import BeautifulSoup
import urllib2
import re
import csv
@ramhiser
ramhiser / CyTOF-visualization.Rmd
Last active February 15, 2016 21:10
3D-PCA of CyTOF Data from Newell et al. (2012)
Interactive Visualization of CyTOF Data
========================================================
```{r setup, echo=FALSE}
library(rgl)
knit_hooks$set(webgl = hook_webgl)
opts_knit$set(upload.fun = imgur_upload, base.url = NULL) # upload all images to imgur.com
```
This report produces a 3D visualization of the CD8+ T-cell subsets from [Newell et al. (2012)](http://www.ncbi.nlm.nih.gov/pubmed/22265676) using principal components analysis (PCA).
@ramhiser
ramhiser / monty-python-collocations.py
Last active December 24, 2015 21:49
Brief analysis of the collocations of the Monty Python and the Holy Grail script.
import nltk
from nltk.collocations import *
from nltk.book import *
import re
bigram_measures = nltk.collocations.BigramAssocMeasures()
# Monty Python and the Holy Grail
# Reduces tokens to words. Ignores ALL CAPS words, which are the speaker in the movie.
@ramhiser
ramhiser / census-regions.r
Last active August 29, 2015 13:56
Creates data.frame of census regions by state
# For info about census regions, see:
# http://en.wikipedia.org/wiki/List_of_regions_of_the_United_States#Census_Bureau-designated_regions_and_divisions
# Region - Northeast
# Division - New England
new_england <- data.frame(
region = "Northeast",
division = "New England",
state = c("ME", "NH", "VT", "MA", "RI", "CT")
@ramhiser
ramhiser / dataframe_multiindex_columns.py
Last active August 29, 2015 13:56
Create a Pandas DataFrame with columns named using a MultiIndex
import numpy as np
import pandas as pd
from itertools import chain, izip, repeat
np.random.seed(42)
num_rows = 10
num_features = 5
num_feature_values = 3
# Builds tuples of features with many values per feature
@ramhiser
ramhiser / filter-small-groups.r
Last active August 29, 2015 14:00
Filters out groups in data.frame having less than a specified number of observations
library(dplyr)
group_size <- 20
foo <- iris[1:119, ]
filter(group_by(foo, Species), n() >= group_size)
@ramhiser
ramhiser / latlong2fips.r
Created May 6, 2014 03:35
Latitude/Longitude to FIPS Codes via the FCC's API
# FCC's Census Block Conversions API
# http://www.fcc.gov/developers/census-block-conversions-api
latlong2fips <- function(latitude, longitude) {
url <- "http://data.fcc.gov/api/block/find?format=json&latitude=%f&longitude=%f"
url <- sprintf(url, latitude, longitude)
json <- RCurl::getURL(url)
json <- RJSONIO::fromJSON(json)
as.character(json$County['FIPS'])
}