Skip to content

Instantly share code, notes, and snippets.

@sdwfrost
Created June 24, 2014 19:54
Show Gist options
  • Save sdwfrost/76901da7ae4886111094 to your computer and use it in GitHub Desktop.
Save sdwfrost/76901da7ae4886111094 to your computer and use it in GitHub Desktop.
Marginal likelihood in Julia
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"language": "Julia",
"name": "",
"signature": "sha256:c86da477b9840e0a9bb395db3db2298ec9a95e9278d664b3708e05647eae1284"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Under the Bayesian paradigm, inference is based on the posterior probability over the parameters of interest. It's helpful to think of our inferences being conditional on a given model, $M$ with a parameter vector $\\theta \\subset \\Theta$. Given a dataset, $D$, and a model, the posterior distribution of the parameter values is given by Bayes' theorem.\n",
"\n",
"\\begin{align}\n",
"Pr(\\theta|D,M) = \\frac{Pr(D|\\theta,M)Pr(\\theta|M)}{Pr(D|M)}\n",
"\\end{align}\n",
"\n",
"$Pr(D|\\theta,M)$ is the likelihood function, $Pr(\\theta|M)$ is the prior probability , and $Pr(D|M)$ is known as the marginal likelihood, predictive probability, or evidence. $Pr(D|M)$ is a normalising constant that ensures that $Pr(\\theta|D,M)$ is a probability.\n",
"\n",
"\\begin{align}\n",
"Pr(D|M) = \\int_{\\Theta}\\Pr(D|\\theta,M)Pr(\\theta|M) d\\theta\n",
"\\end{align}\n",
"\n",
"However, one often wants to compare the fit of different models. As a function of the model $M$, the marginal likelihood can be interpreted sa the likelihood of the model $M$ given the data $D$. Hence, to choose between several models, one simply chooses the one with the highest marginal likelihood. When comparing two models,say $M_0$ and $M_1$, a ratio of marginal likelihoods, known as the Bayes factor, $K$, is usually defined:\n",
"\n",
"\\begin{align}\n",
"K_{01} = \\frac{Pr(D|M_1)}{Pr(D|M_0)}\n",
"\\end{align}\n",
"\n",
"Interpretations of the Bayes factor have been provided by [Jeffreys](http://books.google.co.uk/books?id=vh9Act9rtzQC) and [Kass and Raftery](http://amstat.tandfonline.com/doi/abs/10.1080/01621459.1995.10476572).\n",
"\n",
"Unfortunately, in most cases, it is not possible to obtain an exact solution of the marginal likelihood. A number of approaches have been described to obtain an approximate numerical estimate of the marginal likelihood; here, I illustrate two approaches based on *tempering*.\n",
"\n",
"# A simple example\n",
"\n",
"Before I describe what tempering means in this context, let's consider a simple example, for which there *is* an analytical solution for the marginal likelihood. Consider the problem of fitting a set of $n=100$ exponential random variables, $X$ with parameter $\\lambda=3$.\n",
"\n",
"We can generate these in Julia as follows."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"using Distributions\n",
"srand(1)\n",
"lambda=3.0\n",
"n=100\n",
"x=rand(Exponential(1.0/lambda),n);"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However, as I want to compare with my R code, I'll use the same random data as I did in my R code."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"x=[0.251727277709448, 0.393880926369035, 0.0485689089012643, 0.0465984206228326, \n",
"0.145356208593058, 0.964989512488025, 0.409854017804964, 0.179894279999038, \n",
"0.318855831232818, 0.0490153301679842, 0.463578376270143, 0.254009951823358, \n",
"0.41253451691161, 1.47464473922761, 0.351514389103556, 0.34508131534358, \n",
"0.625345057469416, 0.218248879071325, 0.11231115879491, 0.196159907151014, \n",
"0.788171751007882, 0.213964196077238, 0.0980401292132835, 0.18862184152628, \n",
"0.0353575410942237, 0.0198130534651379, 0.192904154459635, 1.31964428405416, \n",
"0.39110403527359, 0.332270985081281, 0.478428447791025, 0.0124228421288232, \n",
"0.108003384278466, 0.44015597643793, 0.0678367833438553, 0.340908625770017, \n",
"0.100580311380327, 0.241738101143046, 0.250514230575028, 0.0783424836236563, \n",
"0.359960379064981, 0.342748967916672, 0.430753882558053, 0.417701784817779, \n",
"0.18488046588997, 0.100427665458216, 0.431041552050316, 0.331518596024219, \n",
"0.17139143201833, 0.669277467426467, 0.140747481646637, 0.72625752274651, \n",
"1.07259633924935, 0.185943118296564, 0.19820588392516, 0.325798601941002, \n",
"0.0699555268511176, 0.103149285384764, 0.368645422767504, 0.258062588036958, \n",
"0.0298913594374683, 0.36939221892146, 0.0824214177800864, 0.523995615513929, \n",
"1.61093758162402, 0.143710710729162, 0.910129771215431, 0.378943805090106, \n",
"0.271122748654481, 0.279002163557281, 0.594921801513742, 0.770823875400138, \n",
"0.969962422990418, 0.0951969947976371, 0.129595590685436, 0.0173518159426749, \n",
"0.117290165961365, 0.521747115103197, 0.271511935442642, 0.919747929357203, \n",
"0.128731187824502, 0.336085485590899, 0.272838062945121, 0.0197537352020542, \n",
"0.761284491005254, 0.268056970386112, 0.527898693352688, 0.411263835931219, \n",
"0.448548006190264, 0.700125743282549, 0.345032382432028, 0.149150631080071, \n",
"0.347869504850014, 0.0869427483218412, 0.227076369337738, 0.0879127546757239, \n",
"0.148868551484028, 0.0702022976862887, 0.0441904686409075, 0.116296115132973\n",
"]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 2,
"text": [
"100-element Array{Float64,1}:\n",
" 0.251727 \n",
" 0.393881 \n",
" 0.0485689\n",
" 0.0465984\n",
" 0.145356 \n",
" 0.96499 \n",
" 0.409854 \n",
" 0.179894 \n",
" 0.318856 \n",
" 0.0490153\n",
" \u22ee \n",
" 0.149151 \n",
" 0.34787 \n",
" 0.0869427\n",
" 0.227076 \n",
" 0.0879128\n",
" 0.148869 \n",
" 0.0702023\n",
" 0.0441905\n",
" 0.116296 "
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The likelihood of the data given the rate parameter $\\lambda$ is as follows.\n",
"\n",
"\\begin{align}\n",
"Pr(X|\\lambda) & = \\prod_{i=1}^{n=100} \\lambda \\rm{exp}(-\\lambda x_i) \\cr\n",
"& = \\lambda^n \\rm{exp}(-\\lambda n \\bar{x})\n",
"\\end{align}\n",
"\n",
"where $\\bar{x}$ is the sample mean of $X$.\n",
"\n",
"As described in [Wikipedia](http://en.wikipedia.org/wiki/Exponential_distribution) (which is remarkably handy for distributions), if we assume a Gamma($\\alpha$,$\\beta$) prior on the rate coefficient, the posterior distribution of $\\lambda$ is Gamma($\\alpha+n$,$\\beta+n \\bar{x}$), the conjugate prior for an exponential distribution.\n",
"\n",
"\\begin{align}\n",
"Pr(\\lambda|X,\\alpha, \\beta) & \\propto Pr(X|\\lambda) \\times Pr(\\lambda| \\alpha, \\beta) \\cr\n",
"& = \\lambda^n \\rm{exp}(-\\lambda n \\bar{x}) \\times \\frac{\\beta^\\alpha}{\\Gamma(\\alpha)} \\lambda^{\\alpha-1} \\rm{exp}(-\\lambda \\beta) \\cr\n",
" & = \\frac{\\beta^\\alpha}{\\Gamma(\\alpha)} \\lambda^{\\alpha+n-1} \\rm{exp}(-\\lambda (\\beta + n \\bar{x}))\n",
"\\end{align}\n",
"\n",
"The marginal likelihood of this model can be calculated by integrating $Pr(X|\\lambda) \\times Pr(\\lambda| \\alpha, \\beta)$ over $\\lambda$.\n",
"\n",
"\\begin{align}\n",
"\\int_{\\lambda=0}^{\\infty}\\frac{\\beta^\\alpha}{\\Gamma(\\alpha)} \\lambda^{\\alpha+n-1} \\rm{exp}(-\\lambda (\\beta + n \\bar{x})) \\; d\\lambda & = \\frac{\\beta^\\alpha}{\\Gamma(\\alpha)} \\int_{0}^{\\infty}\\lambda^{\\alpha+n-1} exp(-\\lambda (\\beta + n \\bar{x})) \\; d\\lambda \\cr\n",
"& = \\frac{\\beta^\\alpha}{\\Gamma(\\alpha)} \\frac{\\Gamma(\\alpha+n)}{(\\beta+ n \\bar{x})^{a+n}}\n",
"\\end{align}\n",
"\n",
"The log marginal likelihood in Julia can be calculated as follows."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function lml(x::Vector{Float64},alpha::Float64,beta::Float64)\n",
" mux=mean(x)\n",
" n=convert(Float64,length(x))\n",
" alpha*log(beta)-(alpha+n)*log(beta+n*mux)+lgamma(alpha+n)-lgamma(alpha)\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 3,
"text": [
"lml (generic function with 1 method)"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For $\\alpha=1$ and $\\beta=1$, the log marginal likelihood for these data is around 3.6."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"alpha=1.0\n",
"beta=1.0\n",
"lml(x,alpha,beta)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"3.6274358925479078"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In many cases, however, we don't have an analytical solution to the posterior distribution or the marginal likelihood."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tempering\n",
"\n",
"Several approaches to calculating marginal likelhoods are based on the idea of tempering, in which we consider running MCMC at a range of different (inverse) 'temperatures', obtaining by raising likelihood to a power between 0 and 1; when the power is 0, we sample from the prior, while when the power is 1, we sample from the posterior. While we use the tempered likelihood to determine acceptance probabilities in the MCMC, we will use samples of the untempered likelihood to compute the marginal likelihood."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function met_temper(x::Vector{Float64},lambda0::Float64,alpha::Float64,beta::Float64,sigma::Float64,temp::Float64,niters::Int64)\n",
" lambdavec=vec(Array(Float64,niters))\n",
" llikvec=vec(Array(Float64,niters))\n",
" lliktvec=vec(Array(Float64,niters))\n",
" lpriorvec=vec(Array(Float64,niters))\n",
" lpostvec=vec(Array(Float64,niters))\n",
" lambda=lambda0\n",
" llik=sum(logpdf(Exponential(1.0/lambda),x))\n",
" llikt=temp*llik\n",
" lprior=logpdf(Gamma(alpha,beta),lambda)\n",
" lpost=llik+lprior\n",
" for i in 1:niters\n",
" lambdas = lambda + rand(Normal(0,sigma),1)[1]\n",
" if lambdas>0\n",
" lliks=sum(logpdf(Exponential(1/lambdas),x))\n",
" llikst=temp*lliks\n",
" lpriors=logpdf(Gamma(alpha,beta),lambdas)\n",
" lposts=llikst+lpriors\n",
" A=exp(lposts-lpost)\n",
" if rand()<A\n",
" lambda,llik,llikt,lprior,lpost=lambdas,lliks,llikst,lpriors,lposts\n",
" end\n",
" end\n",
" lambdavec[i],llikvec[i],lliktvec[i],lpriorvec[i],lpostvec[i]=lambda,llik,llikt,lprior,lpost\n",
" end\n",
" [lambdavec llikvec lliktvec lpriorvec lpostvec]\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"met_temper (generic function with 1 method)"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I first run the chain at a range of temperatures."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"tempvec=linspace(0,1.0,101).^5.0 # Range of temperatures\n",
"numtemp=length(tempvec)\n",
"pplist=vec(Array(Matrix{Float64},numtemp))\n",
"burnin=1000\n",
"niters=10000\n",
"lambda0=1.0 # Initial condition for lambda\n",
"sigma=1.0 # Dispersion for the random walk sampler\n",
"for i in 1:numtemp\n",
" out=met_temper(x,lambda0,alpha,beta,sigma,tempvec[i],niters+burnin)\n",
" pplist[i]=out\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Power posteriors\n",
"\n",
"The power posterior approach is based on integrating the expectation of the log likelihood (for a given chain) across the inverse temperatures (see [Friel and Pettit (2008)](http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2007.00650.x/full) and [Lartillot and Philippe (2006)](http://sysbio.oxfordjournals.org/content/55/2/195.short). Friel and Pettitt used a trapezoidal scheme to integrate the likelihood, while a [revised scheme](http://link.springer.com/article/10.1007/s11222-013-9397-1) used an improved trapezoidal scheme, which also requires the variance of the log likelihood.\n",
"\n",
"I first extract the mean and the variance of the log likelihoods."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ell=zeros(numtemp)\n",
"vll=zeros(numtemp)\n",
"for i in 1:numtemp\n",
" out=pplist[i]\n",
" ell[i]=mean(out[burnin:niters,2])\n",
" vll[i]=var(out[burnin:niters,2])\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following uses this information to compute the log marginal likelihood, using the modified trapezoidal scheme."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function ppml(ell::Vector{Float64},vll::Vector{Float64},tempvec::Vector{Float64})\n",
" N=length(ell)\n",
" res=0.0\n",
" for i in 1:(N-1)\n",
" wts = tempvec[i+1]-tempvec[i]\n",
" res = res+wts*((ell[i+1]+ell[i])/2.0)-((wts^2)/12.0)*(vll[i+1]-vll[i])\n",
" end\n",
" res\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 8,
"text": [
"ppml (generic function with 1 method)"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This function calculates the bounds on the marginal likelihood, based on the error from integration."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"function boundml(ell::Vector{Float64},tempvec::Vector{Float64})\n",
" tempdiff = tempvec[2:length(tempvec)] .- tempvec[1:(length(tempvec)-1)]\n",
" ub = sum(tempdiff .* ell[2:length(ell)])\n",
" lb = sum(tempdiff .* ell[1:(length(ell)-1)])\n",
" [lb ub]\n",
"end"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 9,
"text": [
"boundml (generic function with 1 method)"
]
}
],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now I can compute the log marginal likelihood from the series of tempered chains, and compare with the analytical result."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ppml(ell,vll,tempvec)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 10,
"text": [
"3.6195674785397585"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"boundml(ell,tempvec)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 11,
"text": [
"1x2 Array{Float64,2}:\n",
" 3.50298 3.73064"
]
}
],
"prompt_number": 11
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Stepping stone\n",
"\n",
"The [stepping stone approach](http://sysbio.oxfordjournals.org/content/early/2010/12/27/sysbio.syq085.short) also employs tempered distributions, but rather than integrating the expectation of the log likelihood, it employs importance sampling between adjacent tempered chains to calculate a series of normalising constants. The product of these normalising constants gives an estimate of the marginal likelihood. Here, I estimate the log ratio of marginal likelihoods."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"lrss=0.0\n",
"for i in 2:numtemp\n",
" tempdiff = tempvec[i]-tempvec[i-1]\n",
" logmaxll = maximum(pplist[i][(burnin+1):(burnin+niters),2])\n",
" oldll = pplist[i-1][(burnin+1):(burnin+niters),2]\n",
" lrss = lrss+tempdiff*logmaxll\n",
" lrss = lrss+log((1/length(oldll))*sum(exp(tempdiff*(oldll-logmaxll))))\n",
"end\n",
"lrss"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 12,
"text": [
"3.6237428915234613"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, the code isn't that different between R and Julia (although the ability to do multiple assignment in a single line in Julia makes for more compact code). However, where they differ is in speed. The bottleneck is in running the MCMC across multiple temperatures. Firstly, let's calculate the time elapsed for a single run in Julia."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"@elapsed out=met_temper(x,lambda0,alpha,beta,sigma,1.0,100000)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 13,
"text": [
"0.312924833"
]
}
],
"prompt_number": 13
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In R, the equivalent code runs in about 2.7 seconds, or about ten times slower."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment