Skip to content

Instantly share code, notes, and snippets.

@nhoffman
Created June 29, 2011 05:13
Show Gist options
  • Save nhoffman/1053199 to your computer and use it in GitHub Desktop.
Save nhoffman/1053199 to your computer and use it in GitHub Desktop.
Use mutual information to find a cutoff separating two distributions
## Use mutual information to define a value separating two
## distributions.
entropy <- function(x,y){
## shannon entropy of x or joint entropy of x and y
if(missing(y)){
freqs <- table(x)/length(x)
}else{
stopifnot(length(x) == length(y))
freqs <- table(paste(x,y))/length(x)
}
-sum(freqs*log(freqs))
}
mutinfo <- function(x,y){
entropy(x) + entropy(y) - entropy(x,y)
}
## some pretend data
data <- data.frame(
V = c(rnorm(100)+5, rnorm(100)+7),
W = c(rep('within', 100), rep('between', 100))
)
## mutual information at various values of d
cuts <- sort(unique(round(data$V, 1)))
vals <- with(data, sapply(cuts, function(d) mutinfo(W, V < d)))
## D is the point of maximum mutual information - it divides "withins"
## from "betweens"
D <- cuts[which.max(vals)]
par(mfrow=c(1,2))
plot(cuts, vals, ylab="mutual information")
abline(v=D, col='red', new=FALSE)
boxplot(V ~ W, data=data)
abline(h=D, col='red', new=FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment