title

author

date

output

How many to sample among suspected population of COVID-19

Wan Nor Arifin

1/15/2021

html_document

keep_md
true

About

Research question: How many people do we need to sample to have 90% probability of detecting at least one +ve COVID-19 case among the suspected population? I may relate this question to one my post few years back at http://wnarifin.blogspot.com/2014/12/how-many-cows-are-required-per-farm-so.html

Suppose the prevalence of COVID-19 is,

# prevalence of covid-19 in Malaysia
# ref: https://wnarifin.github.io/covid-19-malaysia/
p = 452.43/100000

Prevalence = 0.0045243

Suppose we set a cutoff value of 0.9 for probability. It is said that the rule of thumb to sample n = 20 if N < 50 or n = 30/10%*N if N > 50. To find whether this is reasonable, we can simulate the situations as proposed by the rule of thumb.

Rule of thumb solution

# If we sample up to 30
n = numeric(1)
n.low = 20 # lower limit of n
n.high = 30 # upper limit of n
pr = numeric(0)
n_ = numeric(0)
pr_ = numeric(0)
for(i in n.low:n.high) {
  n = i
  #print(n)
  pr = 1 - pbinom(0, n, p)
  #print(pr)
  n_[i] = n
  pr_[i] = pr
}
det_p = cbind(n=n_[n.low:n.high], Probability=pr_[n.low:n.high])
det_p_max = det_p[which.max(det_p[,2]),]

The maximum probability is only 0.1271896 for n = 30.

10 percent rule

If we vary n = .1*N, where N suspected population, n = 10% of N

N = c(50, 100, 500, 1000, 5000, 10000)
det_df = data.frame(N = rep(0,6), n = rep(0,6), Probability = rep(0,6))
for(j in 1:length(N)) {
n = numeric(1)
if(N[j] == 50) {n.low = 20; n.high = 30}
if(N[j] > 50) {
n.low = 30 # lower limit of n
n.high = max(.1*N[j], 30) # upper limit of n at 10% N
}
pr = numeric(0)
p = 0.0045243 # prevalence of covid-19 in Malaysia
n_ = numeric(0)
pr_ = numeric(0)
for(i in n.low:n.high) {
  n = i
  #print(n)
  pr = 1 - pbinom(0, n, p)
  #print(pr)
  n_[i] = n
  pr_[i] = pr
}
det_p = cbind(N=N[j], n=n_[n.low:n.high], Probability=pr_[n.low:n.high])
det_df[j, ] = det_p[which.max(det_p[,2]),]
}
knitr::kable(det_df)

N	n	Probability
50	30	0.1271896
100	30	0.1271896
500	50	0.2028626
1000	100	0.3645720
5000	500	0.8964067
10000	1000	0.9892684
We can see that we can only apply 10% rule for N of 5000, with probability of 90%

Conclusion

It is not sensible to apply rule of thumb of sampling n = 20 if N < 50 or n = 30/10%*N if N > 50. However, if the prevalence of COVID-19 is assumed to be higher among suspected population, this may be reasonable. This code can be changed to test that assumption.

wnarifin/how_many_samples_covid19.md

About

Rule of thumb solution

10 percent rule

Conclusion