The calculation of bottleneck size (
This method is an attempt to formalize the intial development from this GitHub repository, which is based on the methodologies from this Nature Methods paper.
In the context of bottleneck estimation, "T0" refers to the initial time point where the frequency of each unique allele (or tag) in the population is recorded. The idea is to compare allele frequencies at this initial time point ("T0") with those at a later time point ("Tn") to estimate the effective population size (Ne) or the bottleneck size (Nb).
This code aims to calculate Nb
based on temporal allele frequency data:
# Calculate frequencies of alleles at T0
botneck.t0 <- bottleneck_data %>%
filter(doublings == 0) %>%
group_by(type) %>%
mutate(
fi0 = count / sum(count),
count0 = count
) %>%
select(type, spacer, fi0, count0) %>%
nest() %>%
rename(data0 = data) %>%
mutate(s0 = map_dbl(data0, ~ sum(.$count0)))
# Calculate frequencies of alleles at other times and compute Nb using the formula
botneck <- bottleneck_data %>%
filter(doublings != 0) %>%
group_by(type, condition, sample, doublings) %>%
mutate(
fis = count / sum(count)
) %>%
nest() %>%
mutate(
ss = map_dbl(data, ~ sum(.$count))
) %>%
full_join(botneck.t0) %>%
mutate(
data = map2(data, data0, inner_join)
)
botneck <- botneck %>%
mutate(
data = map(data, ~ .x %>% mutate(ratio = ((fis - fi0)^2) / (fi0 * (1 - fi0)^2)))
)
botneck <- botneck %>%
mutate(
f_hat = map_dbl(data, ~ sum(.$ratio)) * (1 / map_dbl(data, ~ n_distinct(.$spacer))),
Nb = doublings / (f_hat - 1 / s0 - 1 / ss)
)
# Summary Statistics
botneck.stats <- botneck %>%
group_by(condition, type) %>%
summarise(
Nb.med = median(Nb),
Nb.range = max(Nb) - min(Nb),
Nb.mean = mean(Nb),
Nb.sd = sd(Nb)
)
The statistics calculated are median, range, mean, and standard deviation of
-
Nb.med
: Median provides a robust measure of the central tendency of$N_b$ . -
Nb.range
: Range gives the span of$N_b$ , indicating the variability. -
Nb.mean
: Mean offers an average measure but can be affected by outliers. -
Nb.sd
: Standard deviation quantifies how much$N_b$ values deviate from the mean.