Created
June 11, 2012 17:46
-
-
Save ryanwitt/2911560 to your computer and use it in GitHub Desktop.
Confusion matrix for a logistic glm model in R. Helpful for comparing glm to randomForests.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
confusion.glm <- function(data, model) { | |
prediction <- ifelse(predict(model, data, type='response') > 0.5, TRUE, FALSE) | |
confusion <- table(prediction, as.logical(model$y)) | |
confusion <- cbind(confusion, c(1 - confusion[1,1]/(confusion[1,1]+confusion[2,1]), 1 - confusion[2,2]/(confusion[2,2]+confusion[1,2]))) | |
confusion <- as.data.frame(confusion) | |
names(confusion) <- c('FALSE', 'TRUE', 'class.error') | |
confusion | |
} |
In comparing against randomForest confusion matrices, I find it easier to have true values on the left margin and predicted values on the top margin, as that's what randomForest presents. Below gives that version, with some added flexibility for providing new data or not and a customizable cutoff value.
confusion.glm <- function(model, des.mat=NULL, response=NULL, cutoff=0.5) {
if (missing(des.mat)) {
prediction <- predict(model, type='response') > cutoff
confusion <- table(as.logical(model$y), prediction)
} else {
if (missing(response) || class(response) != "logical") {
stop("Must give logical vector as response when des.mat given")
}
prediction <- predict(model, des.mat, type='response') > cutoff
confusion <- table(response, prediction)
}
confusion <- cbind(confusion,
c(1 - confusion[1,1] / rowSums(confusion)[1],
1 - confusion[2,2] / rowSums(confusion)[2]))
confusion <- as.data.frame(confusion)
names(confusion) <- c('FALSE', 'TRUE', 'class.error')
return(confusion)
}
What do you do with that, after you have run that code? There is no output, so I assume something additional coding follows this...?
In the case of a prediction vector of all True or all False [Class-Error 100%] this throws an error which is fixed by changing to
confusion <- table(factor(prediction, levels = c(FALSE, TRUE)), as.logical(model$y))
Great! Thank
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for posting...worked like a charm.