Inspired by the Real Statistics Resources article by Charles Zaiontz, I wanted to make the described excel formulas work automatically and agnostic to the orientation of the contingency table.
Using the same contingency table as in the real-statistics link above:
A | B | C | D | |
---|---|---|---|---|
5 | Intervention | Control | Total | |
6 | Cured | 2 | 7 | 9 |
7 | Not Cured | 9 | 3 | 12 |
8 | Total | 11 | 10 | 21 |
Then create the same table as Figure 2 in the linked article, but the cumulative probabilities are not required:
K | L | |
---|---|---|
19 | Count | Prob |
20 | 0 | 3.40217E-05 |
21 | 1 | 0.001684074 |
22 | 2 | 0.022454326 |
23 | 3 | 0.117885211 |
24 | 4 | 0.282924506 |
... I filled the table up to a count of 30 so the contingency table could accept a wider range of frequencies.
with the probability in L20 generated using the formula =HYPGEOM.DIST(K20,$B$8,$D$6,$D$8,FALSE)
.
Note the $ in front of cell references making the references "fixed" which allows the formula to be dragged down
and only the reference to K20
will change. You could also use named ranges in the contingency table if you prefer.
We use the value of the upper left cell as an index to lookup the corresponding probability =XLOOKUP($B$6,K20:K50,L20:L50)
. I named this cell "cutpt", but you could refer to this cell using A1 notation if you prefer. It has the value 0.022454326
in the example.
To create the 2 tail p-value, we then we need to sum all probabilities Less than or equal to the "cutpt" probability above, which can be achieved using the formula
=SUM(L20:L50*(L20:L50<=cutpt))
. It will have the value 0.02997312
Note you could use the SUMIF
function if you prefer - the function above simply multiplies all probabilities greater than cutpt
by 0, and all probabilities less than or equal to the cutpoint by 1 before calculating the total.