Skip to content

Instantly share code, notes, and snippets.

View primaryobjects's full-sized avatar

Kory Becker primaryobjects

View GitHub Profile
@primaryobjects
primaryobjects / gerber.R
Created May 9, 2016 17:42
Voting habits analysis with logistic regression and CART regression tree models.
library(ROCR)
library(rpart)
library(rpart.plot)
gerber <- read.csv('gerber.csv')
table(gerber$voting)
table(gerber$voting)[2] / nrow(gerber)
# Which of the four "treatment groups" had the largest percentage of people who actually voted (voting = 1)?
@primaryobjects
primaryobjects / letters.R
Created May 9, 2016 18:03
Predicting letter classification A, B, P, or R using CART classification models and random forests.
library(caTools)
library(rpart)
library(randomForest)
letters <- read.csv('letters_ABPR.csv')
letters$isB <- as.factor(letters$letter == 'B')
set.seed(1000)
spl <- sample.split(letters$isB, SplitRatio = 0.5)
@primaryobjects
primaryobjects / over50k.R
Created May 9, 2016 18:47
Analysis of census for earning over50k (>= $50,000 per year) income using logistic regression, CART classification trees, k-fold cross validation, and random forests.
library(caTools)
library(ROCR)
library(rpart)
library(rpart.plot)
library(randomForest)
library(caret)
data <- read.csv('census.csv')
set.seed(2000)
@primaryobjects
primaryobjects / plot.png
Last active June 27, 2019 13:38
Sentiment analysis of Apple tweets, using CART, random forests, logistic regression with best accuracy of 89% from random forests.
plot.png
@primaryobjects
primaryobjects / dplot.png
Last active May 18, 2016 16:42
Predictive coding analysis of Enron emails to determine responsive documents about energy scandal. Using a CART model in R.
dplot.png
@primaryobjects
primaryobjects / plot.png
Last active May 18, 2016 17:52
Wikipedia vandalism detection using CART classification and regression tree model with an accuracy of 0.7188306.
plot.png
@primaryobjects
primaryobjects / clinical_trials.R
Created May 18, 2016 18:33
Analysis of clinical trials using NLP with a CART model to determine if abstract and title is applicable to a study.
packages <- c('tm', 'SnowballC', 'caTools', 'rpart', 'rpart.plot', 'randomForest', 'ROCR')
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
library(tm)
library(SnowballC)
library(caTools)
library(rpart)
library(rpart.plot)
library(randomForest)
@primaryobjects
primaryobjects / emails.R
Created May 18, 2016 19:18
Detection of ham and spam emails from a data set using logistic regression, CART, and random forests. Random forests performs the best on train and test sets, while logistic regression overfits the training.
packages <- c('tm', 'SnowballC', 'caTools', 'rpart', 'rpart.plot', 'randomForest', 'ROCR')
if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
install.packages(setdiff(packages, rownames(installed.packages())))
}
library(tm)
library(SnowballC)
library(caTools)
library(rpart)
library(rpart.plot)
library(randomForest)
@primaryobjects
primaryobjects / movies.R
Last active May 23, 2016 17:43
Basic movie recommendation system, using content filtering (grouping movies with similar characteristics (ie., genre) together).
# Basic movie recommendation system, using content filtering (grouping movies with similar characteristics (ie., genre) together).
# Download dataset, if it does not exist.
fileName <- 'movieLens.txt';
if (!file.exists(fileName)) {
# , method="curl"
download.file(paste0('http://files.grouplens.org/datasets/movielens/ml-100k/u.item', ''), fileName)
}
movies <- read.table('movieLens.txt', header=F, sep='|', quote='"')
@primaryobjects
primaryobjects / clustering.R
Created May 23, 2016 18:36
Clustering with hierarchical and k-means in R for image analysis.
library(flexclust)
# Download dataset, if it does not exist.
fileName <- 'flower.csv';
if (!file.exists(fileName)) {
# , method="curl"
download.file(paste0('https://d37djvu3ytnwxt.cloudfront.net/asset-v1:MITx+15.071x_3+1T2016+type@asset+block/', fileName), fileName)
}
fileName <- 'healthy.csv';