Skip to content

Instantly share code, notes, and snippets.

@diamonaj
Created November 28, 2016 09:54
Show Gist options
  • Select an option

  • Save diamonaj/2c823fe88940a81857dde51f78ceb212 to your computer and use it in GitHub Desktop.

Select an option

Save diamonaj/2c823fe88940a81857dde51f78ceb212 to your computer and use it in GitHub Desktop.
# Download TITANIC data, loading stringsAsFactors = FALSE
mm <- read.csv("trainTitanic.csv", stringsAsFactors = FALSE)
# delete columns (Name, Cabin, Ticket, PassengerId, SibSp, Parch, Embarked)
mm <- mm[,-c(1, 4, 7, 8, 9, 11, 12)]
# dimensions are 891 x 5
mm <- na.omit(mm)
# dimensinos are 714 x 5
# set.seed
set.seed(123)
training.obs <- sample(1:714, 20, replace = FALSE)
training.set <- mm[training.obs,]
left.overs <- mm[-training.obs,]
test.obs <- sample(1:(714 - 20), 5, replace = FALSE)
test.set <- left.overs[test.obs,]
titan <- tree(Survived ~ Pclass + Sex + Age + Fare, data = training.set, mincut = 1)
# show how the tree changes as 'mincut' changes from 1 to 2 to 3 to 4 to 5.
plot(titan)
text(titan, pretty = 0)
titan
# alternative regression tree library ("rpart")
library(rpart)
titan.rpart <- rpart(Survived ~ Pclass + Sex + Age + Fare, minsplit = 2, method = "class", data = training.set)
plot(titan.rpart); text(titan.rpart, use.n = TRUE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment