title | author | date | output | ||||
---|---|---|---|---|---|---|---|
How many to sample among suspected population of COVID-19 |
Wan Nor Arifin |
1/15/2021 |
|
Research question: How many people do we need to sample to have 90% probability of detecting at least one +ve COVID-19 case among the suspected population? I may relate this question to one my post few years back at http://wnarifin.blogspot.com/2014/12/how-many-cows-are-required-per-farm-so.html
Suppose the prevalence of COVID-19 is,
# prevalence of covid-19 in Malaysia
# ref: https://wnarifin.github.io/covid-19-malaysia/
p = 452.43/100000
Prevalence = 0.0045243
Suppose we set a cutoff value of 0.9 for probability. It is said that the rule of thumb to sample n = 20 if N < 50 or n = 30/10%*N if N > 50. To find whether this is reasonable, we can simulate the situations as proposed by the rule of thumb.
# If we sample up to 30
n = numeric(1)
n.low = 20 # lower limit of n
n.high = 30 # upper limit of n
pr = numeric(0)
n_ = numeric(0)
pr_ = numeric(0)
for(i in n.low:n.high) {
n = i
#print(n)
pr = 1 - pbinom(0, n, p)
#print(pr)
n_[i] = n
pr_[i] = pr
}
det_p = cbind(n=n_[n.low:n.high], Probability=pr_[n.low:n.high])
det_p_max = det_p[which.max(det_p[,2]),]
The maximum probability is only 0.1271896 for n = 30.
If we vary n = .1*N, where N suspected population, n = 10% of N
N = c(50, 100, 500, 1000, 5000, 10000)
det_df = data.frame(N = rep(0,6), n = rep(0,6), Probability = rep(0,6))
for(j in 1:length(N)) {
n = numeric(1)
if(N[j] == 50) {n.low = 20; n.high = 30}
if(N[j] > 50) {
n.low = 30 # lower limit of n
n.high = max(.1*N[j], 30) # upper limit of n at 10% N
}
pr = numeric(0)
p = 0.0045243 # prevalence of covid-19 in Malaysia
n_ = numeric(0)
pr_ = numeric(0)
for(i in n.low:n.high) {
n = i
#print(n)
pr = 1 - pbinom(0, n, p)
#print(pr)
n_[i] = n
pr_[i] = pr
}
det_p = cbind(N=N[j], n=n_[n.low:n.high], Probability=pr_[n.low:n.high])
det_df[j, ] = det_p[which.max(det_p[,2]),]
}
knitr::kable(det_df)
N | n | Probability |
---|---|---|
50 | 30 | 0.1271896 |
100 | 30 | 0.1271896 |
500 | 50 | 0.2028626 |
1000 | 100 | 0.3645720 |
5000 | 500 | 0.8964067 |
10000 | 1000 | 0.9892684 |
We can see that we can only apply 10% rule for N of 5000, with probability of 90% |
It is not sensible to apply rule of thumb of sampling n = 20 if N < 50 or n = 30/10%*N if N > 50. However, if the prevalence of COVID-19 is assumed to be higher among suspected population, this may be reasonable. This code can be changed to test that assumption.