Day 4 of the Beat 5 Kaggle Benchmarks in 5 Days challenge
For the Forest Cover Type Prediction competition on Kaggle, the goal is to predict the predominant type of trees in a given section of forest. The score is based on average classification accuracy for the 7 different tree cover classes.
To beat the all fir/spruce benchmark I obviously tried a random forest. Using the default settings of scikit-learn's RandomForestClassifier, I was able to beat the benchmark with an accuracy score of 0.72718 on the competition leaderboard. By using 100 estimators (versus the default of 10), I was able to raise that accuracy score up to 0.75455.
Using pandas I loaded the train and test data sets into Python.