Skip to content

Instantly share code, notes, and snippets.

@Nikolaj-K
Last active March 16, 2025 00:36
Show Gist options
  • Save Nikolaj-K/9757ef85463c221d5e0a52140c9740e4 to your computer and use it in GitHub Desktop.
Save Nikolaj-K/9757ef85463c221d5e0a52140c9740e4 to your computer and use it in GitHub Desktop.
The expectation of the sample variance equals the variance

Denote the expectation by ${\mathbb E}[f(X)]:=\sum_x f(x)\cdot p(x)$, where $\sum_x$ is the sum (or integral) over all $x$ and $p(x)$ is the probability distribution in question.

Denote the variance by $\mathrm{Var}[X]:={\mathbb E}[(X-{\mathbb E}[X])^2]$

Further, define the standard deviation $\sigma[X]=\mathrm{Var}[X]^\frac{1}{2}$ (Which, if we use units, has the same units as $E[X]$ or $X$ itself.)

Funny source of python issue:

import statistics, pandas, torch, numpy

xs = list(numpy.random.random(20))
xs
statistics.mean(xs)
pandas.Series(xs).mean()
numpy.mean(xs)
torch.mean(torch.tensor(xs))

statistics.stdev(xs)
pandas.Series(xs).std()
torch.std(torch.tensor(xs))
numpy.std(xs)

xs = [15.0, 17.0, 19.0]

statistics.mean(xs)  # 17.0
statistics.variance(xs)  # 4.0
statistics.stdev(xs)  # 2.0

pandas.Series(xs).mean()  # 17.0
pandas.Series(xs).var()  # 4.0
pandas.Series(xs).std()  # 2.0

numpy.mean(xs)  # 17.0
numpy.var(xs)  # 2.6666666666666665 = 8/3 = (2*(2**2))/3
numpy.std(xs)  # 1.632993161855452

#See also
#etc.

===Theorem (informal)=== Note: $\forall i. {\mathbb E}[X_i-X_1] =0 \implies {\mathbb E}[\sum_{i=1}^n X_i] = n,{\mathbb E}[X_1]$

To discuss consequences of re-use of samples from $X$, the discussion here is for $X_i$ iid random variables. For $x_i$ sampled from $X_i$, with $i\in{1,2,\dots, n}$, define the sample mean by

$\overline x := \frac{1}{n} \sum_{j=1}^n x_j$.

By the above impication, ${\mathbb E}[\overline X] = {\mathbb E}[X_1]$, so $\overline x$ may be used to estimate the expectation.

Note: Consider the case $n=2$. Then, whatever the samples are, $\overline x$ is always estimated to be between those samples. So $\overline x$, a means to estimatre ${\mathbb E}$, is contrained by where the samples are, in a very geometric way.

Let $D(m) := \sum_{k=1}^n \left( x_k - m \right)^2$

Informal theorem: A "good" estimator of the variance $\mathrm{Var}[X]:={\mathbb E}[(X-{\mathbb E}[X])^2]$, via a sample set of size $n$, is the so called sample variance $\frac{1}{n-1}D(\overline x)$

Naive question: If the number $m$ is an estimate o ${\mathbb E}[X]$, we might want to estimate the variance via $\frac{1}{n} D(\overline x)$ As $\overline x$ indeed estimates ${\mathbb E}[X]$, so why not use $\frac{1}{n}$? We will see that this naive formula fail to have desired properties.

Hint: $D(\overline x) = \left( x_k - m \right)^2$ means a re-use of samples, leading to bias. Note: $D'(m) = \sum_{k=1}^n 2 \left( x_k - m \right) \cdot (-1) = 2 n (m - \overline x)$ I.e. among all values for $m$, letting $m={\overline x}$ is in fact minimizing $D$.

$\bullet$ We'll discuss why $\frac{1}{n-1}$ over $\frac{1}{n}$ is sensible in a formal proof. $\bullet$ Dividing by the bigger $n$ leads to an underestimate, as we'll argue albeit the difference $\frac{n-1}{n}<1$ matters little, in praxis, for large $n$.


===Theorem=== The expectation of the sample variance (defined above, for any sample set) equals that of $(X-{\mathbb E}[X])^2$.

$i=j\implies {\mathbb E}[X_i\cdot X_j] = {\mathbb E}[X_1^2]$ $i\neq j\implies {\mathbb E}[X_i\cdot X_j] = {\mathbb E}[X_1]^2$

With this, $ {\mathbb E} \left[ \left( \sum_{j=1}^n (X_k - X_j) \right)^2 \right]$ $= {\mathbb E} \left[ \sum_{j=1}^n \sum_{l=1}^n (X_k - X_j)(X_k - X_l) \right]$ $= {\mathbb E}[X_1^2]\cdot(n^2+2n\cdot(-1)+n\cdot(-1)^2) + {\mathbb E}[X_1]^2\cdot(2n(n-1)\cdot(-1) + n(n-1)\cdot(-1)^2)$ $= n(n-1) \big({\mathbb E}[X_1^2] - {\mathbb E}[X_1]^2\big)$ $= n(n-1) ({\mathbb E}[(X-{\mathbb E}[X])^2])$ $= n(n-1) \mathrm{Var}(X_1)$

Hence ${\mathbb E} \left[ \sum_{k=1}^n \left( X_k - \frac{1}{n} \sum_{j=1}^n X_j \right)^2 \right]$ $\sum_{k=1}^n {\mathbb E} \left[ \left( X_k - \frac{1}{n} \sum_{j=1}^n X_j \right)^2 \right]$ $= \frac{1}{n^2} \sum_{k=1}^n {\mathbb E} \left[ \left( \sum_{j=1}^n (X_k - X_j) \right)^2 \right]$ $= (n-1) \mathrm{Var}(X_1)$

The average over all $\left( x_k - {\overline x} \right)^2$, i.e. the more naive formula $\frac{1}{n} D(\overline x)$, is instead systematically underestimating the variance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment