Andrew Landgraf andland

Below are links to manuscripts or packages related to my talk at JSM on Supervised Dimensionality Reduction. Session info here.

My slides will be here.

Contact:

I will not share anyone's information. I will only use aggregated data with no personal information.

Below are links to manuscripts or packages related to my talk at JSM on Generalized PCA. Session info here.

My slides are here.

	Title	URL
	Data Clustering C++	http://www.amazon.com/Data-Clustering-Object-Oriented-Knowledge-Discovery/dp/1439862230
	Transportation Statistics and Microsimulation	http://www.amazon.com/Transportation-Statistics-Microsimulation-Clifford-Spiegelman/dp/1439800235
	Fundamentals of Transportation and Traffic Operations	http://www.amazon.com/Fundamentals-Transportation-Traffic-Operations-Daganzo/dp/0080427855
	A First Course in Stochastic Processes	http://www.amazon.com/First-Course-Stochastic-Processes-Second/dp/0123985528
	A Probability Path	http://www.amazon.com/A-Probability-Path-Sidney-Resnick/dp/081764055X
	A Primer on Linear Models	http://www.amazon.com/Primer-Linear-Chapman-Statistical-Science/dp/1420062018
	Statistical Approach to Genetic Epidemiology	http://www.amazon.com/Statistical-Approach-Genetic-Epidemiology-Applications/dp/3527323899
	Intro Trans Engineering	http://www.amazon.com/Introduction-Transportation-Engineering-Banks-James/dp/0072431881

See this link for an introduction on time stacking and time slicing.

time_slice.R requires the number of pixels wide or tall the image is to be a multiple of the number of images in your timelapse.

time_slice_v2.R attempts to get around this. Some images will contribute more pixels per slice than others. This is done by making the first x% of the images cover the first x% of the pixels (with appropriate rounding). It does not deal with number of images being greater than the height or width of the images in pixels. Version 2 will probably work better for you.

For example, if the images are 150 pixels wide and your timelapse has 100 images, time_slice.R will make the first image have a slice which is 51 pixels wide. The remaining 99 images will get slices which are 1 pixel wide. time_slice_v2.R will alternate between 1 pixel per i

	roll_die <- function(status, strategy) {
	roll = sample(c(names(status), "basket"), 1)
	if (roll == "basket") {
	trees = status[names(status) != "raven" & status > 0]
	if (strategy == "optimal") {
	biggest_trees = trees[trees == max(trees)]
	roll = sample(names(biggest_trees), 1)
	} else if (strategy == "random") {
	roll = sample(names(trees), 1)
	} else if (strategy == "worst") {

	library(XML)
	library(lubridate)
	library(sqldf)
	library(reshape2)
	library(ggplot2)
	library(mgcv)

	cat("loading old data...\n")
	playlist=read.csv("CD101Playlist.csv",stringsAsFactors=FALSE)
	colnames(playlist)[3]="Last Played"

	library(shiny)
	library(ggplot2)
	exp.age.df=read.csv("https://dl.dropboxusercontent.com/u/17648661/ExpAgeByNameYear.csv")

	age.range=range(exp.age.df$Age)

	unique.names=sort(unique(exp.age.df$Name))
	unique.names=c("<NONE>",as.character(unique.names))

	start.names=c("Andrew","Dylan","Fred","Grace","Lillian","John")

	# rm(list=ls())
	setwd("Kaggle/Amazon Employee")

	train = read.csv("train.csv")
	test = read.csv("test.csv")
	train$ROLE_TITLE <- NULL # Because the same as ROLE_CODE
	test$ROLE_TITLE <- NULL # Because the same as ROLE_CODE

	jaccard <- function(vec, matrix) {
	rowSums(as.matrix(sweep(matrix, 2, as.numeric(vec), "==")))

	# inspired by http://schamberlain.github.io/2012/01/logistic-regression-barplot-fig/
	logithistplot <- function(data,breaks="Sturges",se=TRUE) {
	require(ggplot2);
	col_names=names(data)

	# get min and max axis values
	min_x <- min(data[,1])
	max_x <- max(data[,1])

	# get bin numbers