Skip to content

Instantly share code, notes, and snippets.

@explodecomputer
Created July 23, 2021 14:29
Show Gist options
  • Save explodecomputer/43eee4249c6f50e7af29d5b70118bda6 to your computer and use it in GitHub Desktop.
Save explodecomputer/43eee4249c6f50e7af29d5b70118bda6 to your computer and use it in GitHub Desktop.
Comparing survfey answers before and after intervention
---
title: Comparing survey answers before and after intervention
author: Gibran Hemani
---
```{r}
suppressMessages(suppressWarnings(suppressPackageStartupMessages({
library(knitr)
library(tidyverse)
})))
knitr::opts_chunk$set(warning=FALSE, message=FALSE)
```
If you have 140 participants in the first round and 49 in the second round, and you ask each round a yes/no question, you can test if there is a meaningful change in the proportion that say yes from round 1 to round 2 using a Fisher's exact probability test. e.g.
Suppose 100/149 said yes in the first round, and 44/49 said yes in the second round:
```{r}
prop.test(x=c(100,44), n=c(140,49))
```
The null hypothesis is that the proportions in round 1 and 2 are the same. The p-value here (0.01625) is the probability that the null hypothesis is true. The proportions of yes are sufficiently different to reject the null hypothesis at the p < 0.05 level.
When the sample sizes, the size of the difference in proportions needs to be larger to be meaningfully different. Just for illustration we can look at different combinations of proportions before and after to see in what scenarios you would find a p-value less than 0.05 e.g.
```{r}
a <- expand.grid(
before=1:139,
after=1:48,
pval=NA
)
for(i in 1:nrow(a))
{
a$pval[i] <- prop.test(x=c(a$before[i], a$after[i]), n=c(140, 49))$p.value
}
ggplot(a, aes(x=before, y=after)) +
geom_tile(aes(fill=pval < 0.05)) +
labs(x="Number of 'yes' before", y="Number of 'yes' after")
```
So quite a lot of the possible values are substantially different at this sample size.
Couple of notes
1. Fisher developed foundational theory behind much of statistics used today, but sadly he was also a eugenicist with some fairly unpleasant views.
2. Using p < 0.05 is really not the way to do science these days, just used it here for illustration, but do read: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment