Created
July 23, 2021 14:29
-
-
Save explodecomputer/43eee4249c6f50e7af29d5b70118bda6 to your computer and use it in GitHub Desktop.
Comparing survfey answers before and after intervention
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: Comparing survey answers before and after intervention | |
author: Gibran Hemani | |
--- | |
```{r} | |
suppressMessages(suppressWarnings(suppressPackageStartupMessages({ | |
library(knitr) | |
library(tidyverse) | |
}))) | |
knitr::opts_chunk$set(warning=FALSE, message=FALSE) | |
``` | |
If you have 140 participants in the first round and 49 in the second round, and you ask each round a yes/no question, you can test if there is a meaningful change in the proportion that say yes from round 1 to round 2 using a Fisher's exact probability test. e.g. | |
Suppose 100/149 said yes in the first round, and 44/49 said yes in the second round: | |
```{r} | |
prop.test(x=c(100,44), n=c(140,49)) | |
``` | |
The null hypothesis is that the proportions in round 1 and 2 are the same. The p-value here (0.01625) is the probability that the null hypothesis is true. The proportions of yes are sufficiently different to reject the null hypothesis at the p < 0.05 level. | |
When the sample sizes, the size of the difference in proportions needs to be larger to be meaningfully different. Just for illustration we can look at different combinations of proportions before and after to see in what scenarios you would find a p-value less than 0.05 e.g. | |
```{r} | |
a <- expand.grid( | |
before=1:139, | |
after=1:48, | |
pval=NA | |
) | |
for(i in 1:nrow(a)) | |
{ | |
a$pval[i] <- prop.test(x=c(a$before[i], a$after[i]), n=c(140, 49))$p.value | |
} | |
ggplot(a, aes(x=before, y=after)) + | |
geom_tile(aes(fill=pval < 0.05)) + | |
labs(x="Number of 'yes' before", y="Number of 'yes' after") | |
``` | |
So quite a lot of the possible values are substantially different at this sample size. | |
Couple of notes | |
1. Fisher developed foundational theory behind much of statistics used today, but sadly he was also a eugenicist with some fairly unpleasant views. | |
2. Using p < 0.05 is really not the way to do science these days, just used it here for illustration, but do read: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment