Skip to content

Instantly share code, notes, and snippets.

@JeremyMcCormick
Created October 23, 2021 20:27
Show Gist options
  • Save JeremyMcCormick/ce4e9e324b5410b7b10ca225e04af44c to your computer and use it in GitHub Desktop.
Save JeremyMcCormick/ce4e9e324b5410b7b10ca225e04af44c to your computer and use it in GitHub Desktop.
Model selection using CV
# snippet: not all code included
# revised loop to do cross-validation
for (m in 1:length(allModels)) {
allpredicted = rep(NA,n) # storage for honest predictions
for (ii in 1: nfolds) { # ii is an easier string to search for index
groupii = (cvgroups == ii)
trainset = bodyfat[!groupii,] # all data EXCEPT for group ii
testset = bodyfat[groupii, ] # data in group ii
modelfit = lm(allModels[[m]], data=trainset) # fit to train set
predicted = predict(modelfit, newdata = testset) # predict for test set
allpredicted[groupii] = predicted # store in ordered locations
}
y = bodyfat$BodyFatSiri
CVvalue = mean((y - allpredicted)^2); CVvalue
allCVvalues[m] = CVvalue
obs.pred = data.frame(y = y, yhat = allpredicted)
allplots[[m]] <- gf_point(y ~ yhat, data = obs.pred) %>%
gf_labs(title = paste("Compare Observed to Predicted for Model", m),
y = "Observed", x = "Predicted")
}
allCVvalues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment