Last active
August 15, 2022 20:41
-
-
Save dlebech/5e2885e852bdca38d99c27a55fb6636c to your computer and use it in GitHub Desktop.
Binomial probability calculation function in SQL (BigQuery)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
-- Public Domain CC0 license. https://creativecommons.org/publicdomain/zero/1.0/ | |
-- Calculate the probability of k successes for n trials with probability of success k, | |
-- using the binomial distribution. | |
-- Calculate the binomial coefficient using the "multiplicative formula" | |
CREATE OR REPLACE FUNCTION functions.binomial_coef(n INT64, k INT64) AS (( | |
-- k!/(n!*(n-k)!) | |
-- We're going to have a hard time doing factorials here, | |
-- but based on the "multiplicative formula" in Wiki, it should be possible: | |
-- https://en.wikipedia.org/wiki/Binomial_coefficient#Computing_the_value_of_binomial_coefficients | |
-- Taken together with the fact that we can do EXP(SUM(LOG(x))) to do product over multiple rows, we have: | |
SELECT | |
-- Binomial coefficient is always an integer, | |
-- but BigQuery is not super happy about EXP(SUM(LN(x))) trick, | |
-- so the ROUND is added to be safe :-) | |
ROUND(EXP(SUM(LN((n + 1 - i) / i)))) | |
FROM | |
-- Take advantage of symmetry (n // k) == (n // (n-k)) | |
UNNEST(GENERATE_ARRAY(1, IF(k > n - k, n - k, k), 1)) i | |
)); | |
-- Calculate the probability of k successes in n trials where the probability of a success is p. | |
-- This uses the probability density function for the binomial distribution. | |
-- See example below. | |
CREATE OR REPLACE FUNCTION functions.binomial_prob(n INT64, k INT64, p FLOAT64) AS ( | |
functions.binomial_coef(n, k) * POW(p, k) * POW(1-p, n-k) | |
); | |
-- The example from Wikipedia: https://en.wikipedia.org/wiki/Binomial_distribution#Example | |
-- Probability of getting 4 heads in 6 tosses with a coin that is biased and has heads probability 0.3 | |
-- Calculates to 0.05953499999999999 | |
SELECT functions.binomial_prob(6, 4, 0.3); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Aha, that makes sense @benman1 I'm happy to hear it worked out for you with a different solution 🙂