Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save timcdlucas/83955f109357652f73d42be2a52049d0 to your computer and use it in GitHub Desktop.
Save timcdlucas/83955f109357652f73d42be2a52049d0 to your computer and use it in GitHub Desktop.
Correlation vs MAE
# Make data with biased train2 set
d <- data.frame(x = runif(300, 0, 5))
d$y <- d$x + rnorm(300)
d$g <- c('train1', 'train2', 'test')
d$y[d$g == 'train2'] <- d$y[d$g == 'train2'] + 5
plot(d$x, d$y, col = factor(d$g))
# Fit models
m1 <- lm(y ~ x, subset(d, d$g == 'train1'))
m2 <- lm(y ~ x, subset(d, d$g == 'train2'))
# Make predictions
p1 <- predict(m1, newdata = subset(d, d$g == 'test'))
p2 <- predict(m2, newdata = subset(d, d$g == 'test'))
# Is it clear that m2 is biased?
cor(p1, d$y[d$g == 'test'])
cor(p2, d$y[d$g == 'test'])
# Is it clear that m2 is biased?
mean(abs(p1 - d$y[d$g == 'test']))
mean(abs(p2 - d$y[d$g == 'test']))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment