Skip to content

Instantly share code, notes, and snippets.

@m-manu
Created March 30, 2016 12:57
Show Gist options
  • Save m-manu/ead58e797ee4844eb438ba6744d563fd to your computer and use it in GitHub Desktop.
Save m-manu/ead58e797ee4844eb438ba6744d563fd to your computer and use it in GitHub Desktop.
Handy R script to train data in 'training.csv' using random forest and predict output using inputs from 'test.csv'
trainingCsv <- "./training.csv"
testCsv <- "./test.csv"
trainingData <- read.csv(trainingCsv)
testData <- read.csv(testCsv)
numColumns <- dim(trainingData)[2]
columnNames <- colnames(trainingData)
stopifnot(numColumns == dim(testData)[2] + 1)
stopifnot(columnNames[-numColumns] == colnames(testData))
require(randomForest)
rf <-
randomForest(
x = trainingData[,-numColumns], y = as.numeric(unlist(trainingData[outputColumn])), ntree = 100, mtry = 10
)
outputColumn <- columnNames[numColumns]
testData[outputColumn] <- predict(rf, newdata = testData)
View(testData)
# varImpPlot(rf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment