-
-
Save chrishwiggins/f48093ce798e4561c38d to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The Bayesian approach to model selection is a subject you'll | |
like. The basic idea is to compute the "Bayes Factor": | |
http://en.wikipedia.org/wiki/Bayes_factor . | |
As the page says "Bayesian inference has been put forward as a | |
theoretical justification for and generalization of Occam's | |
razor". | |
( http://en.wikipedia.org/wiki/Occam%27s_razor ) | |
The Bayes factor can be approximated under sum assumptions, | |
leading to a simple penalized maximum likelihood called the | |
"BIC" (basically chi-squared, which goes as the model gets more | |
complicated, plus a term linear in the number of parameters, to | |
compensate and produce a "best" value of the number of free | |
parameters): | |
http://en.wikipedia.org/wiki/Bayesian_information_criterion | |
The paper it comes from is very short! | |
http://www.andrew.cmu.edu/user/kk3n/simplicity/schwarzbic.pdf | |
here's a more pedagogical derivation by my friend Harish Bhat: | |
http://nscs00.ucmerced.edu/~nkumar4/BhatKumarBIC.pdf | |
David Mackay, in his PhD thesis (1991) interpreted the Bayes | |
Factor as a function of the data, and suggested a curve which | |
explained the Bayesian approach to model selection graphically: | |
http://authors.library.caltech.edu/13792/1/MACnc92a.pdf | |
Here's a great paper about "Bayesian Occam's razor" based on | |
actually computing and plotting p(D|M) -- the curve you get when | |
you integrate over parameters as a function of the data. | |
http://mlg.eng.cam.ac.uk/zoubin/papers/05occam/occam.pdf | |
This whole approach can also be used for causality. The idea is | |
to find p(D|M) for different choices of M (model) where every | |
model is a possible causal graph of which variables influence | |
which other variables. To make this something you can compute | |
you have to choose a particular case. Heckerman illustrated how | |
this can be done with categorical variables in which case all | |
the relations among the random variables are conditional | |
probability tables, over which one integrates to compute Bayes | |
factors. His 1995 tutorial is awesome and oft-cited: | |
http://research.microsoft.com/pubs/69588/tr-95-06.pdf |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment