Skip to content

Instantly share code, notes, and snippets.

@sdwfrost
Last active January 31, 2020 00:41
Show Gist options
  • Save sdwfrost/b3d0c4cbff7ff460562162affceffb17 to your computer and use it in GitHub Desktop.
Save sdwfrost/b3d0c4cbff7ff460562162affceffb17 to your computer and use it in GitHub Desktop.
Branching process model of 2019nCoV spread

Branching process model of 2019nCoV spread

This is an attempt to reproduce Figure 1 from Imai et al. 2020, using code from Riou and Althaus (code here). I have included a Jupyter notebook (which includes the output), as well as RMarkdown and plain R files, generated using Jupytext.

Details missing from Imai et al. that I had to guess:

  1. The initial time (I took it to be 2019-12-01).
  2. The distribution of generation times (I took it to be gamma with standard deviation of 3.8).
  3. The summary curve is generated using the median of all cases at a given time (similarly with the 95 percentile range). The language in the figure legend is ambiguous, and could mean generating a median trajectory.

I have also tweaked the Riou and Althaus code, changing the parameterization and wrapping the main loop into a function, and have also used a fixed seed. To reduce computational burden, I have only generated 500 trajectories (cf 5000 in Imai et al.), but this should not affect the results (particularly the median curve) too much. The details of the implementation are missing from Imai et al., and so they could well be using a different language, random number generator, etc..

At present, the figures do not match up; the median number of cases should go through 4000 cases on 2020-01-18. This discrepancy may well be due to my errors in the above assumptions and code refactoring.

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
## Branching process model of epidemic spread
# This code is adapted from the study of [Riou and Althaus](https://www.biorxiv.org/content/10.1101/2020.01.23.917351v1), using the [code from GitHub](https://github.com/jriou/wcov), stripped down and rewritten for clarity.
library(ggplot2)
# Set random number seed.
set.seed(1234)
# Define Bellman-Harris branching process model.
bhbp <- function(R0,k,shape,scale,index_cases,max_cases,max_time){
t <- rep(0, index_cases)
times <- t
tmax <- 0
cases <- index_cases
while(cases > 0 & length(times) < max_cases) {
secondary = rnbinom(cases, size=k, mu=R0)
t.new = numeric()
for(j in 1:length(secondary)) {
t.new = c(t.new, t[j] + rgamma(secondary[j], shape = shape, scale = scale))
}
t.new = t.new[t.new < max_time]
cases = length(t.new)
t = t.new
times = c(times, t.new)
}
times <- sort(times)
return(times)
}
# Set parameter values, using values from [Imai et al.](https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-2019-nCoV-transmissibility.pdf) and assuming a gamma distribution of generation times with mean `mu` 8.4 days and standard deviation `sd` of 3.8 days, taken from [Lipsitch et al. (2003](https://science.sciencemag.org/content/300/5627/1966.full).
R0 <- 2.6
k <- 0.16
mu <- 8.4
stdev <- 3.8
shape <- (mu/stdev)^2
scale <- (stdev^2)/mu
# Plot the generation time distribution.
t <- seq(0,30,by=0.01)
g <- dgamma(t,shape=shape,scale=scale)
ggplot(data.frame(GenerationTime=t,Probability=g))+geom_line(aes(x=GenerationTime,y=Probability))
# Plot the offspring distribution.
i <- seq(0,10)
d <- dnbinom(i, size=k, mu=R0)
ggplot(data.frame(Number=i,Probability=d))+geom_bar(aes(x=Number,y=Probability),stat="identity")
# Initial and stopping conditions.
index_cases <- 40
max_cases <- 5e4
max_time <- 90
# Set the number of simulations (note - Imai et al. used 5000).
nsims <- 500
# Run simulations.
l <- list()
for(i in 1:nsims){
times <- bhbp(R0,k,shape,scale,index_cases,max_cases,max_time)
# Generates cumulative counts per day
# Note that this includes the index cases
l[[i]] <- cumsum(hist(times, breaks = 0:max_time,plot=FALSE)$counts)
}
# Combine individual runs into a dataframe.
results <- as.data.frame(do.call(cbind,l))
median_cases <- apply(results,1,median)
lq_cases <- apply(results,1,quantile,0.025)
uq_cases <- apply(results,1,quantile,0.975)
summary_results <- data.frame(Day=seq(1,max_time),
Date=as.Date("2019-12-01")+seq(1,max_time),
Median=median_cases,
Lower=lq_cases,Upper=uq_cases)
# Add day/dates with day 0 corresponding to 2019-12-01.
results$Day <- seq(1,max_time)
results$Date <- as.Date("2019-12-01")+results$Day
results$Day <- seq(1,max_time)
results$Date <- as.Date("2019-12-01")+results$Day
# Reshape results into 'long' format.
df <- reshape(results,varying=paste("V",1:nsims,sep=""),direction="long",sep="",idvar="Day",timevar="Run",v.names=c("Cases"))
# Plot trajectories over time, highlighting 4000 cases on 2020-01-18.
ntraj <- 20
ggplot(summary_results)+
geom_line(aes(x=Date,y=Median),color="red")+
geom_line(aes(x=Date,y=Upper),color="blue")+
geom_line(aes(x=Date,y=Lower),color="blue")+
geom_ribbon(aes(x=Date,ymin=Lower,ymax=Upper, linetype = NA), fill="blue", alpha = 0.1)+
geom_line(aes(x=Date,y=Cases,group=factor(Run)),data=df[df$Run<=ntraj,],color="gray")+
coord_cartesian(xlim=c(as.Date("2019-12-01"),as.Date("2020-01-30")),ylim=c(1,20000))+
geom_vline(xintercept=as.Date("2020-01-18"))+
geom_hline(yintercept=4000)+
theme_classic()
## Branching process model of epidemic spread
This code is adapted from the study of [Riou and Althaus](https://www.biorxiv.org/content/10.1101/2020.01.23.917351v1), using the [code from GitHub](https://github.com/jriou/wcov), stripped down and rewritten for clarity.
```{r}
library(ggplot2)
```
Set random number seed.
```{r}
set.seed(1234)
```
Define Bellman-Harris branching process model.
```{r}
bhbp <- function(R0,k,shape,scale,index_cases,max_cases,max_time){
t <- rep(0, index_cases)
times <- t
tmax <- 0
cases <- index_cases
while(cases > 0 & length(times) < max_cases) {
secondary = rnbinom(cases, size=k, mu=R0)
t.new = numeric()
for(j in 1:length(secondary)) {
t.new = c(t.new, t[j] + rgamma(secondary[j], shape = shape, scale = scale))
}
t.new = t.new[t.new < max_time]
cases = length(t.new)
t = t.new
times = c(times, t.new)
}
times <- sort(times)
return(times)
}
```
Set parameter values, using values from [Imai et al.](https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-2019-nCoV-transmissibility.pdf) and assuming a gamma distribution of generation times with mean `mu` 8.4 days and standard deviation `sd` of 3.8 days, taken from [Lipsitch et al. (2003](https://science.sciencemag.org/content/300/5627/1966.full).
```{r}
R0 <- 2.6
k <- 0.16
mu <- 8.4
stdev <- 3.8
shape <- (mu/stdev)^2
scale <- (stdev^2)/mu
```
Plot the generation time distribution.
```{r}
t <- seq(0,30,by=0.01)
g <- dgamma(t,shape=shape,scale=scale)
ggplot(data.frame(GenerationTime=t,Probability=g))+geom_line(aes(x=GenerationTime,y=Probability))
```
Plot the offspring distribution.
```{r}
i <- seq(0,10)
d <- dnbinom(i, size=k, mu=R0)
ggplot(data.frame(Number=i,Probability=d))+geom_bar(aes(x=Number,y=Probability),stat="identity")
```
Initial and stopping conditions.
```{r}
index_cases <- 40
max_cases <- 5e4
max_time <- 90
```
Set the number of simulations (note - Imai et al. used 5000).
```{r}
nsims <- 500
```
Run simulations.
```{r}
l <- list()
for(i in 1:nsims){
times <- bhbp(R0,k,shape,scale,index_cases,max_cases,max_time)
# Generates cumulative counts per day
# Note that this includes the index cases
l[[i]] <- cumsum(hist(times, breaks = 0:max_time,plot=FALSE)$counts)
}
```
Combine individual runs into a dataframe.
```{r}
results <- as.data.frame(do.call(cbind,l))
```
```{r}
median_cases <- apply(results,1,median)
lq_cases <- apply(results,1,quantile,0.025)
uq_cases <- apply(results,1,quantile,0.975)
summary_results <- data.frame(Day=seq(1,max_time),
Date=as.Date("2019-12-01")+seq(1,max_time),
Median=median_cases,
Lower=lq_cases,Upper=uq_cases)
```
Add day/dates with day 0 corresponding to 2019-12-01.
```{r}
results$Day <- seq(1,max_time)
results$Date <- as.Date("2019-12-01")+results$Day
```
```{r}
results$Day <- seq(1,max_time)
results$Date <- as.Date("2019-12-01")+results$Day
```
Reshape results into 'long' format.
```{r}
df <- reshape(results,varying=paste("V",1:nsims,sep=""),direction="long",sep="",idvar="Day",timevar="Run",v.names=c("Cases"))
```
Plot trajectories over time, highlighting 4000 cases on 2020-01-18.
```{r}
ntraj <- 20
ggplot(summary_results)+
geom_line(aes(x=Date,y=Median),color="red")+
geom_line(aes(x=Date,y=Upper),color="blue")+
geom_line(aes(x=Date,y=Lower),color="blue")+
geom_ribbon(aes(x=Date,ymin=Lower,ymax=Upper, linetype = NA), fill="blue", alpha = 0.1)+
geom_line(aes(x=Date,y=Cases,group=factor(Run)),data=df[df$Run<=ntraj,],color="gray")+
coord_cartesian(xlim=c(as.Date("2019-12-01"),as.Date("2020-01-30")),ylim=c(1,20000))+
geom_vline(xintercept=as.Date("2020-01-18"))+
geom_hline(yintercept=4000)+
theme_classic()
```
@slwu89
Copy link

slwu89 commented Jan 31, 2020

In case anyone wants to play around with a much faster version (posting here so its not only on twitter), I translated the bhbp simulation code into C and wrapped in an R package here: https://github.com/slwu89/nCoV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment