explodecomputer · July 23, 2021 14:29
diff --git a/proportions.rmd b/proportions.rmd
 ---
 title: Comparing survey answers before and after intervention
 author: Gibran Hemani
 ---

 ```{r}
 suppressMessages(suppressWarnings(suppressPackageStartupMessages({
 	library(knitr)
 	library(tidyverse)
 })))

 knitr::opts_chunk$set(warning=FALSE, message=FALSE)

 ```

 If you have 140 participants in the first round and 49 in the second round, and you ask each round a yes/no question, you can test if there is a meaningful change in the proportion that say yes from round 1 to round 2 using a Fisher's exact probability test. e.g.

 Suppose 100/149 said yes in the first round, and 44/49 said yes in the second round:

 ```{r}
 prop.test(x=c(100,44), n=c(140,49))
 ```

 The null hypothesis is that the proportions in round 1 and 2 are the same. The p-value here (0.01625) is the probability that the null hypothesis is true. The proportions of yes are sufficiently different to reject the null hypothesis at the p < 0.05 level.

 When the sample sizes, the size of the difference in proportions needs to be larger to be meaningfully different. Just for illustration we can look at different combinations of proportions before and after to see in what scenarios you would find a p-value less than 0.05 e.g.

 ```{r}
 a <- expand.grid(
 	before=1:139,
 	after=1:48,
 	pval=NA
 )

 for(i in 1:nrow(a))
 {
 	a$pval[i] <- prop.test(x=c(a$before[i], a$after[i]), n=c(140, 49))$p.value
 }
 ggplot(a, aes(x=before, y=after)) +
 geom_tile(aes(fill=pval < 0.05)) +
 labs(x="Number of 'yes' before", y="Number of 'yes' after")

 ```

 So quite a lot of the possible values are substantially different at this sample size.

 Couple of notes

 1. Fisher developed foundational theory behind much of statistics used today, but sadly he was also a eugenicist with some fairly unpleasant views.
 2. Using p < 0.05 is really not the way to do science these days, just used it here for illustration, but do read: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913
	---
	title: Comparing survey answers before and after intervention
	author: Gibran Hemani
	---

	```{r}
	suppressMessages(suppressWarnings(suppressPackageStartupMessages({
	library(knitr)
	library(tidyverse)
	})))

	knitr::opts_chunk$set(warning=FALSE, message=FALSE)

	```

	If you have 140 participants in the first round and 49 in the second round, and you ask each round a yes/no question, you can test if there is a meaningful change in the proportion that say yes from round 1 to round 2 using a Fisher's exact probability test. e.g.

	Suppose 100/149 said yes in the first round, and 44/49 said yes in the second round:

	```{r}
	prop.test(x=c(100,44), n=c(140,49))
	```

	The null hypothesis is that the proportions in round 1 and 2 are the same. The p-value here (0.01625) is the probability that the null hypothesis is true. The proportions of yes are sufficiently different to reject the null hypothesis at the p < 0.05 level.

	When the sample sizes, the size of the difference in proportions needs to be larger to be meaningfully different. Just for illustration we can look at different combinations of proportions before and after to see in what scenarios you would find a p-value less than 0.05 e.g.

	```{r}
	a <- expand.grid(
	before=1:139,
	after=1:48,
	pval=NA
	)

	for(i in 1:nrow(a))
	{
	a$pval[i] <- prop.test(x=c(a$before[i], a$after[i]), n=c(140, 49))$p.value
	}
	ggplot(a, aes(x=before, y=after)) +
	geom_tile(aes(fill=pval < 0.05)) +
	labs(x="Number of 'yes' before", y="Number of 'yes' after")

	```

	So quite a lot of the possible values are substantially different at this sample size.

	Couple of notes

	1. Fisher developed foundational theory behind much of statistics used today, but sadly he was also a eugenicist with some fairly unpleasant views.
	2. Using p < 0.05 is really not the way to do science these days, just used it here for illustration, but do read: https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913