Skip to content

Instantly share code, notes, and snippets.

@truncs
Created April 6, 2012 06:24
Show Gist options
  • Save truncs/2317582 to your computer and use it in GitHub Desktop.
Save truncs/2317582 to your computer and use it in GitHub Desktop.
Uniform sampling
# Get a subset of values from the training set where days spent is zero
temp <- subset(train, DaysInHospital_Y2 == log1p(0))
# Get 100 samples from it with replacement and add it to a new dataframe
uniform_set <- temp[sample(nrow(temp), 100, replace=TRUE),]
# Keep on doing this for other values as well
uniform_set <- merge(uniform_set, temp[sample(nrow(temp), 100, replace=TRUE),], all=T)
# Train a tree on it
uniform_tree <- tree(DaysInHospital_Y2 ~ ., uniform_set)
plot(uniform_tree,type="uniform"); text(uniform_tree,pretty=0)
# Sample a smaller set from the old training set
uniform_test <- train[sample(nrow(train), 400, replace=T),]
# Predict values
result <- predict(uniform_tree, uniform_test, type="vector")
# Append the result
uniform_test$predicted <- result
# Calculate the RMSE
sqrt(mean(log1p(uniform_test$DaysInHospital_Y2) - test$predicted)^2))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment