Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
Last active October 9, 2018 09:07
Show Gist options
  • Save explodecomputer/602e0c9f8a8fa594934d840219601e33 to your computer and use it in GitHub Desktop.
Save explodecomputer/602e0c9f8a8fa594934d840219601e33 to your computer and use it in GitHub Desktop.
Overfitting and adjusted r square
# How does adjusted r-square do with overfitting?
rm(list=ls())
set.seed(101)
library(ggplot2)
library(tidyr)
y <- rnorm(1000)
x <- matrix(rnorm(1000 * 1000), 1000, 1000)
res <- expand.grid(var=1:300, r.squared=NA, adj.r.squared=NA, pred.r.squared=NA)
for(i in 1:nrow(res))
{
message(i)
ou <- summary(lm(y ~ x[,1:res$var[i]]))
res$r.squared[i] <- ou$r.squared
res$adj.r.squared[i] <- ou$adj.r.squared
pred <- x[,1:res$var[i]] %*% as.matrix(coefficients(ou)[-1,1])
res$pred.r.squared[i] <- cor(y, pred)^2
}
res <- gather(res, "key", "value", r.squared, adj.r.squared, pred.r.squared)
ggplot(res, aes(x=var, y=value)) +
geom_point(aes(colour=key)) +
geom_line(aes(colour=key)) +
labs(x="Number of variables", y="Rsq or Adjusted Rsq") +
scale_colour_brewer(type="qual")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment