Skip to content

Instantly share code, notes, and snippets.

@dill
Last active November 16, 2025 11:41
Show Gist options
  • Select an option

  • Save dill/50132a69295b0822239230b885c44f6c to your computer and use it in GitHub Desktop.

Select an option

Save dill/50132a69295b0822239230b885c44f6c to your computer and use it in GitHub Desktop.
how much unique data is there in a bootstrap?
---
title: Bootstraps and unique data
author: David L Miller
date: 2025/11/16
execute:
keep-md: true
cache: true
knitr:
opts_chunk:
fig.path: ./
---
"The bootstrap draws $N$ times uniformly with replacement from a dataset with $N$ items. The probability an item is picked at least once is $1 − (1 − 1/N )^N$ , which for large $N$ becomes $1 − e^{−1} \approx 0.632$. Hence, the number of unique data points in a bootstrap sample is $0.632 N$ on average."
From [this paper](http://arxiv.org/abs/1612.01474).
"For large samples, $N$..." how big does $N$ need to be?
```{r, how-big-is-N}
pickp <- function(N) 1-(1-1/N)^N
pickp_dat <- data.frame(N=100:10000,
p=pickp(100:10000))
library(ggplot2)
ggplot(pickp_dat) +
geom_line(aes(x=N, y=p))+
geom_hline(yintercept=1-exp(1)^(-1), lty=2, colour="red") +
labs(x="Sample size", y="Probability of inclusion at least once") +
theme_minimal()
```
Empirically, what does that look like
```{r vis-inclusion}
samples <- 1:10000
vismat <- matrix(0, 10000, 10000)
for(i in samples){
this_sample <- sample(samples, length(samples), replace=TRUE)
tab_this_sample <- table(this_sample)
vismat[i, as.numeric(names(tab_this_sample))] <- as.numeric(tab_this_sample)
}
# now make the samples out of order for plotting
# otherwise we just plot static
vismat <- apply(vismat, 1, sort)
vismat[vismat==0] <- NA
# rotate
vismat <- t(apply(vismat, 2, rev))
```
```{r plot-inc, fig.width=4, fig.height=7}
library(viridis)
image(z=vismat, col=viridis_pal()(11), xlab="", ylab="", axes=FALSE)
axis(2)
```
@dill
Copy link
Author

dill commented Nov 16, 2025

Plots!
how-big-is-N-1
plot-inc-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment