Suppose you have
Now, between any two random variables
$$ \sigma_{\mathbf{x}_i, \mathbf{x}_j} = E[(\mathbf{x}i - \mu{\mathbf{x}_i})(\mathbf{x}j - \mu{\mathbf{x}_j})] $$
(where
Important assumption: For the sake of simplicity, assume that all data is centered, as @njoshi pointed out. In other words,
So the expression for covariance changes to the following:
I.e. the covariance of two centered RVs is expectation of their elementwise product.
Key point: There is a crucial relationship between the expectation of the product of two RVs, and the inner product of the RVs. Notice that, the RHS in the above expression can be simplified as:
$$ E[\mathbf{x}i\mathbf{x}j] = E[x{i1} \times x{j1}, x_{i2} \times x_{j2}, x_{i3} \times x_{j3}, \ldots, x_{in} \times x_{jn}] $$ $$ \therefore E[\mathbf{x}i\mathbf{x}j] = \frac{\sum^{n}{k=1}{x{ik} \times x_{jk}}}{n} $$
Note that the numerator of the RHS is just the inner product of
Thus, the inner product of
$$ \overline{\mathbf{x}_i\mathbf{x}j} = n\times\sigma{\mathbf{x}_i, \mathbf{x}_j} $$
(Hold that thought, we'll come back to it.)
Now, consider a matrix
$$ \mathbf{X} = \begin{bmatrix} \mathbf{x}{11} & \mathbf{x}{21} & \mathbf{x}{31} & \ldots & \mathbf{x}{n1} \ \mathbf{x}{12} & \mathbf{x}{22} & \mathbf{x}{32} & \ldots & \mathbf{x}{n2} \ \mathbf{x}{13} & \mathbf{x}{23} & \mathbf{x}{33} & \ldots & \mathbf{x}{n3} \ \vdots & \vdots & \vdots & \ldots \ \ \mathbf{x}{1m} & \mathbf{x}{2m} & \mathbf{x}{3m} & \ldots & \mathbf{x}{nm} \ \end{bmatrix} $$
(Note that each RV $\mathbf{x}i$ is expanded above as $[\mathbf{x}{i1} \mathbf{x}{i3} \mathbf{x}{i3} \ldots \mathbf{x}_{in}]$)
So now, if we evaluate
$$ \mathbf{X}^T\mathbf{X} = \begin{bmatrix} \overline{\mathbf{x}_1\mathbf{x}_1} & \overline{\mathbf{x}_1\mathbf{x}_2} & \overline{\mathbf{x}_1\mathbf{x}_3} & \ldots & \overline{\mathbf{x}_1\mathbf{x}_n} \
\overline{\mathbf{x}_2\mathbf{x}_1} & \overline{\mathbf{x}_2\mathbf{x}_2} & \overline{\mathbf{x}_2\mathbf{x}_3} & \ldots & \overline{\mathbf{x}_2\mathbf{x}_n} \
\overline{\mathbf{x}_3\mathbf{x}_1} & \overline{\mathbf{x}_3\mathbf{x}_2} & \overline{\mathbf{x}_3\mathbf{x}_3} & \ldots & \overline{\mathbf{x}_3\mathbf{x}_n} \
\vdots & \vdots & \vdots & \ldots & \vdots \
\overline{\mathbf{x}_n\mathbf{x}_1} & \overline{\mathbf{x}_n\mathbf{x}_2} & \overline{\mathbf{x}_n\mathbf{x}_3} & \ldots & \overline{\mathbf{x}_n\mathbf{x}_n}
\end{bmatrix} $$
where
Now, from the result in the previous section, we know that the inner product of two RVs is
$$ \mathbf{X}^T\mathbf{X} = \begin{bmatrix} n\sigma_{\mathbf{x}_1\mathbf{x}1} & n\sigma{\mathbf{x}_1\mathbf{x}2} & n\sigma{\mathbf{x}_1\mathbf{x}3} & \ldots & n\sigma{\mathbf{x}_1\mathbf{x}_n} \
n\sigma_{\mathbf{x}_2\mathbf{x}1} & n\sigma{\mathbf{x}_2\mathbf{x}2} & n\sigma{\mathbf{x}_2\mathbf{x}3} & \ldots & n\sigma{\mathbf{x}_2\mathbf{x}_n} \
n\sigma_{\mathbf{x}_3\mathbf{x}1} & n\sigma{\mathbf{x}_3\mathbf{x}2} & n\sigma{\mathbf{x}_3\mathbf{x}3} & \ldots & n\sigma{\mathbf{x}_3\mathbf{x}_n} \
\vdots & \vdots & \vdots & \ldots & \vdots \
n\sigma_{\mathbf{x}_n\mathbf{x}1} & n\sigma{\mathbf{x}_n\mathbf{x}2} & n\sigma{\mathbf{x}_n\mathbf{x}3} & \ldots & n\sigma{\mathbf{x}_n\mathbf{x}_n} \end{bmatrix} $$
Dividing both sides by
\sigma_{\mathbf{x}_2\mathbf{x}1} & \sigma{\mathbf{x}_2\mathbf{x}2} & \sigma{\mathbf{x}_2\mathbf{x}3} & \ldots & \sigma{\mathbf{x}_2\mathbf{x}_n} \
\sigma_{\mathbf{x}_3\mathbf{x}1} & \sigma{\mathbf{x}_3\mathbf{x}2} & \sigma{\mathbf{x}_3\mathbf{x}3} & \ldots & \sigma{\mathbf{x}_3\mathbf{x}_n} \
\vdots & \vdots & \vdots & \ldots & \vdots \
\sigma_{\mathbf{x}_n\mathbf{x}1} & \sigma{\mathbf{x}_n\mathbf{x}2} & \sigma{\mathbf{x}_n\mathbf{x}3} & \ldots & \sigma{\mathbf{x}_n\mathbf{x}_n}
\end{bmatrix} $$
The covariance matrix is a compact, convenient way to represent covariances of pairs of vectors.
- Covariance is symmetrical, i.e.
$\sigma(\mathbf{x}_i, \mathbf{x}_j) = \sigma(\mathbf{x}_j, \mathbf{x}_i)$ . Therefore the matrix is also symmetrical. - The diagonal represents variances, since the covariance of an RV with itself is just the variance of that RV.
- Among
$n$ RVs, the number of ways in which they can be paired is$\binom{n}{2}$ ways - and that is exactly half the number of non--diagonal entries in a matrix of size$n \times n$ . (the remaining half is the same, since the matrix is symmetrical). - Being symmetric and positive semi-definite, a lot of convenient properties apply to the covariance matrix - and it is because of these that we can, among other things, perform PCA and SVD.