Skip to content

Instantly share code, notes, and snippets.

@andland
andland / first_orchard_simulation.R
Created January 13, 2020 16:09
Simulates the probability of winning the First Orchard Game https://www.habausa.com/my-very-first-games-first-orchard/
roll_die <- function(status, strategy) {
roll = sample(c(names(status), "basket"), 1)
if (roll == "basket") {
trees = status[names(status) != "raven" & status > 0]
if (strategy == "optimal") {
biggest_trees = trees[trees == max(trees)]
roll = sample(names(biggest_trees), 1)
} else if (strategy == "random") {
roll = sample(names(trees), 1)
} else if (strategy == "worst") {
@andland
andland / JSM Supervised Dimensionality Reduction Talk.md
Last active July 31, 2016 16:16
Info related to my 2016 JSM talk on Supervised Dimensionality Reduction for Exponential Family Data.
@andland
andland / privacy policy.md
Created February 2, 2016 20:01
Privacy Policy

I will not share anyone's information. I will only use aggregated data with no personal information.

@andland
andland / JSM Generalized PCA Talk.md
Last active July 4, 2016 20:22
Info related to my 2015 JSM talk on Generalized PCA.
@andland
andland / AmazonBookURLs.csv
Last active August 29, 2015 14:18
Scrape Amazon's Trade-In Value
Title URL
Data Clustering C++ http://www.amazon.com/Data-Clustering-Object-Oriented-Knowledge-Discovery/dp/1439862230
Transportation Statistics and Microsimulation http://www.amazon.com/Transportation-Statistics-Microsimulation-Clifford-Spiegelman/dp/1439800235
Fundamentals of Transportation and Traffic Operations http://www.amazon.com/Fundamentals-Transportation-Traffic-Operations-Daganzo/dp/0080427855
A First Course in Stochastic Processes http://www.amazon.com/First-Course-Stochastic-Processes-Second/dp/0123985528
A Probability Path http://www.amazon.com/A-Probability-Path-Sidney-Resnick/dp/081764055X
A Primer on Linear Models http://www.amazon.com/Primer-Linear-Chapman-Statistical-Science/dp/1420062018
Statistical Approach to Genetic Epidemiology http://www.amazon.com/Statistical-Approach-Genetic-Epidemiology-Applications/dp/3527323899
Intro Trans Engineering http://www.amazon.com/Introduction-Transportation-Engineering-Banks-James/dp/0072431881
@andland
andland / 00Time_Slicing_README.md
Last active January 1, 2017 13:43
Time Stack and Time Slicing

See this link for an introduction on time stacking and time slicing.

time_slice.R requires the number of pixels wide or tall the image is to be a multiple of the number of images in your timelapse.

time_slice_v2.R attempts to get around this. Some images will contribute more pixels per slice than others. This is done by making the first x% of the images cover the first x% of the pixels (with appropriate rounding). It does not deal with number of images being greater than the height or width of the images in pixels. Version 2 will probably work better for you.

For example, if the images are 150 pixels wide and your timelapse has 100 images, time_slice.R will make the first image have a slice which is 51 pixels wide. The remaining 99 images will get slices which are 1 pixel wide. time_slice_v2.R will alternate between 1 pixel per i

@andland
andland / global.R
Created December 28, 2013 03:19
CD102.5 Top Songs by Artist in 2013
library(XML)
library(lubridate)
library(sqldf)
library(reshape2)
library(ggplot2)
library(mgcv)
cat("loading old data...\n")
playlist=read.csv("CD101Playlist.csv",stringsAsFactors=FALSE)
colnames(playlist)[3]="Last Played"
@andland
andland / global.R
Created November 6, 2013 01:08
Shiny App plotting the average age of everyone in America with a given name in a particular year. To run, install shiny in R and type: library(shiny); runGist(7329230)
library(shiny)
library(ggplot2)
exp.age.df=read.csv("https://dl.dropboxusercontent.com/u/17648661/ExpAgeByNameYear.csv")
age.range=range(exp.age.df$Age)
unique.names=sort(unique(exp.age.df$Name))
unique.names=c("<NONE>",as.character(unique.names))
start.names=c("Andrew","Dylan","Fred","Grace","Lillian","John")
@andland
andland / kaggle_amazon_nearest_neighbor.R
Last active December 18, 2015 22:29
A simple nearest neighbor algorithm for a dataset with categorical variables. This code written for the Amazon Employee Access challenge on Kaggle.com.
# rm(list=ls())
setwd("Kaggle/Amazon Employee")
train = read.csv("train.csv")
test = read.csv("test.csv")
train$ROLE_TITLE <- NULL # Because the same as ROLE_CODE
test$ROLE_TITLE <- NULL # Because the same as ROLE_CODE
jaccard <- function(vec, matrix) {
rowSums(as.matrix(sweep(matrix, 2, as.numeric(vec), "==")))
@andland
andland / logithistplot.R
Last active October 20, 2017 13:02
Plot the relationship between a continuous and a binary variable, with the distribution of the continuous variable conditional on the binary variable. It includes a logistic and spline fit. You can add more layers to the result using standard ggplot2 syntax.
# inspired by http://schamberlain.github.io/2012/01/logistic-regression-barplot-fig/
logithistplot <- function(data,breaks="Sturges",se=TRUE) {
require(ggplot2);
col_names=names(data)
# get min and max axis values
min_x <- min(data[,1])
max_x <- max(data[,1])
# get bin numbers