Estimation of Bottleneck Size

The calculation of bottleneck size ($N_b$) follows the principle of effective population size estimation, adapted to quantify the 'founder' population size, i.e., the number of cells contributing descendants to the sample. The formula used in this study is taken from Krimbas and Tsakas and provides the best fit for estimating $N_b$ as per our data. It calculates $N_b$ using allele frequencies at initial and final time points ($f_{i_0}$ and $f_{i_s}$), the total number of sequence reads at both times ($s_0$ and $s_s$), and the number of generations.

This method is an attempt to formalize the intial development from this GitHub repository, which is based on the methodologies from this Nature Methods paper.

T0 Explanation

In the context of bottleneck estimation, "T0" refers to the initial time point where the frequency of each unique allele (or tag) in the population is recorded. The idea is to compare allele frequencies at this initial time point ("T0") with those at a later time point ("Tn") to estimate the effective population size (Ne) or the bottleneck size (Nb).

Code Comments

This code aims to calculate Nb based on temporal allele frequency data:

# Calculate frequencies of alleles at T0
botneck.t0 <- bottleneck_data %>%
  filter(doublings == 0) %>%
  group_by(type) %>%
  mutate(
    fi0 = count / sum(count),
    count0 = count
  ) %>%
  select(type, spacer, fi0, count0) %>%
  nest() %>%
  rename(data0 = data) %>%
  mutate(s0 = map_dbl(data0, ~ sum(.$count0)))

# Calculate frequencies of alleles at other times and compute Nb using the formula
botneck <- bottleneck_data %>%
  filter(doublings != 0) %>%
  group_by(type, condition, sample, doublings) %>%
  mutate(
    fis = count / sum(count)
  ) %>%
  nest() %>%
  mutate(
    ss = map_dbl(data, ~ sum(.$count))
  ) %>%
  full_join(botneck.t0) %>%
  mutate(
    data = map2(data, data0, inner_join)
  )

botneck <- botneck %>%
  mutate(
    data = map(data, ~ .x %>% mutate(ratio = ((fis - fi0)^2) / (fi0 * (1 - fi0)^2)))
  )

botneck <- botneck %>%
  mutate(
    f_hat = map_dbl(data, ~ sum(.$ratio)) * (1 / map_dbl(data, ~ n_distinct(.$spacer))), 
    Nb = doublings / (f_hat - 1 / s0 - 1 / ss)
  )

# Summary Statistics
botneck.stats <- botneck %>%
  group_by(condition, type) %>%
  summarise(
    Nb.med = median(Nb),
    Nb.range = max(Nb) - min(Nb),
    Nb.mean = mean(Nb),
    Nb.sd = sd(Nb)
  )

Statistics Explanation

The statistics calculated are median, range, mean, and standard deviation of $N_b$ across different conditions and types. These statistics provide an overview of the bottleneck size distribution:

Nb.med: Median provides a robust measure of the central tendency of $N_b$.
Nb.range: Range gives the span of $N_b$, indicating the variability.
Nb.mean: Mean offers an average measure but can be affected by outliers.
Nb.sd: Standard deviation quantifies how much $N_b$ values deviate from the mean.

ryandward/bottleneck.md

Estimation of Bottleneck Size

T0 Explanation

Code Comments

Statistics Explanation