Created
January 4, 2023 04:54
-
-
Save dlakelan/c064185efe44f4db9358438a5b2785e8 to your computer and use it in GitHub Desktop.
When the "true" distribution isn't in the model space, does Bayes give a reasonable answer?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using Distributions, Random, StatsPlots, Turing | |
# We have a pallet of 100 jugs of Orange Juice, each is a 2L jug. | |
# they are filled by a machine which may have a flaw. we would like to find | |
# the avg amount of OJ in a jug by taking a sample of 20 | |
# this is a finite population, and the actual distribution of volumes is just given | |
# by 100 numbers. we'll say they are something like this: | |
Random.set_global_seed!(1234) | |
realdata = shuffle!(vcat(repeat([0.125],20),rand(Normal(1.90,0.12),60), | |
rand(Uniform(1.15,1.5),20))) | |
h = histogram(realdata) | |
display(h) | |
density(realdata; bandwidth=0.1) | |
# This looks nothing like an exponential distribution... but the bayesian data analyst | |
# knows that the volume must be a non-negative number with an average... the maximum | |
# entropy distribution for such a thing is exponential, so will model the distribution of | |
# weights as exponential with a given average. The "true" distribution is nothing like | |
# the ones within the model family, so there can be no "true" parameter. Nevertheless | |
# Bayes will converge to the right answer because exponential distributions with | |
# average near the true average will put the data in a high density region of the | |
# predictive distribution. | |
@model function inferexp(d) | |
m ~ Uniform(0.0,2.5) | |
# the average must be something in the range 0 to some value you'd get if you overfilled a bottle as much as possible, definitely less than 2.5 liters | |
d ~ filldist(Exponential(m),length(d)) | |
end | |
takesamp = sample(realdata,40;replace=false) | |
sm = sample(inferexp(takesamp),NUTS(200,.8),200) | |
p = density(get(sm,:m).m.data; title="Inferred mean value",xlim=(0,2.5),label="Posterior estimate of mean volume") | |
plot!(p,[mean(realdata),mean(realdata)],[0.0,2.0]; label="True mean value of 100 bottles") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment