Created
March 25, 2016 11:00
-
-
Save Kornel/f8fc001abc1ac7be39d7 to your computer and use it in GitHub Desktop.
Decision boundaries for a MLP, SVM and a Random Forest
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Decision boundaries" | |
author: "Kornel Kiełczewski" | |
date: "25 March 2016" | |
output: html_document | |
--- | |
Decision boundaries for a somewhat biased test case :) | |
```{r, echo=FALSE, message=FALSE, warning=FALSE} | |
set.seed(1) | |
library(nnet) | |
library(ggplot2) | |
library(e1071) | |
library(randomForest) | |
``` | |
Let's generate some data: | |
```{r} | |
pts <- seq(0, 360, 10) | |
x1 <- c(sapply(pts, function(x) cos(x)), | |
sapply(pts, function(x) cos(x) * 4), | |
sapply(pts, function(x) cos(x) * 6)) | |
x2 <- c(sapply(pts, function(x) sin(x)), | |
sapply(pts, function(x) sin(x) * 4), | |
sapply(pts, function(x) sin(x) * 6)) | |
data <- data.frame(x1, x2) | |
data$y <- as.factor(c(rep(0, length(pts)), rep(1, length(pts)), rep(2, length(pts)))) | |
``` | |
Take a look at the data: | |
```{r} | |
data.plot <- ggplot() + | |
geom_point(data = data, aes(x = x1, y = x2, color = y)) + | |
coord_fixed() + | |
theme_bw() + | |
xlab('x1') + | |
ylab('x2') | |
print(data.plot) | |
``` | |
Prepare a grid for decision boundaries: | |
```{r} | |
py <- seq(-6, 6, 0.1) | |
px <- seq(-6, 6, 0.1) | |
grid <- expand.grid(px, py) | |
colnames(grid) <- c('x1', 'x2') | |
``` | |
# MLP | |
Fit a MLP. Please note there is no CV here, the size of the hidden layer is somewhat arbitrary. | |
```{r} | |
fit <- nnet(y ~ ., data, size = 40) | |
grid$pred <- as.numeric(predict(fit, grid, type = 'class')) | |
data.plot + stat_contour(data = grid, aes(x = x1, y = x2, z = pred), alpha = 0.9) | |
``` | |
As expected, the decision boundary is moving closer to the upper circle where training points are 'missing'. | |
# SVM | |
Fit a SVM. The kernel is explicitly set to a radial one. | |
```{r} | |
fit <- svm(y ~ ., data, kernel = 'radial') | |
grid$pred <- as.numeric(predict(fit, grid, type = 'class')) | |
data.plot + stat_contour(data = grid, aes(x = x1, y = x2, z = pred), alpha = 0.9) | |
``` | |
It comes as no surprise that a radial kernel can fit these circles perfectly. | |
# Random forest | |
```{r} | |
fit <- randomForest(y ~ ., data) | |
grid$pred <- as.numeric(predict(fit, grid, type = 'class')) | |
data.plot + stat_contour(data = grid, aes(x = x1, y = x2, z = pred), alpha = 0.9) | |
``` | |
The decision boundary generated by the trees comes as expected with many sharp turns. |
Author
Kornel
commented
Mar 25, 2016
This is just what I was looking for, thank you so much for sharing your code! Trying to do a predictive model training at work and wanted to plot the decision boundary for 3 different models after refitting them on their first two Principal Components and plotting each decision boundary.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment