Principal Components Analysis (PCA), that tries to identify the subspace in which the data approximately lies. PCA is computationally efficient, it only requires an eigenvector calculation. Additionally, it is an unsupervised learning technique used for dimensionality reduction.
Imagine you have a dataset in three dimensions: height, weight, and age of a group of people. Each person in the dataset is a data point represented by a vector (height, weight, age). Now, PCA aims to find the directions in which the data varies the most.
In this case, let's say that height and weight have high variance compared to age. PCA would identify the directions (principal components) along which the data varies the most. These directions are orthogonal to each other.
Now, instead of representing each person in the original three-dimensional space, PCA allows you to project them onto a new subspace defined by these principal components. This new subspace retains the most important information about the data while reducing the dimensionality.
Getting a little technical...
Suppose we are given a dataset
Project given data onto a lower dimensional subspace (dimensionality reduction) such that:
- Reconstruction Error is minimized.
- Variance of the Projected data is maximized.
We see that the projected data for the following data points has a fairly large variance, and the points tend to be far from zero. In contrast, suppose had instead picked the following direction:
Here, the projection have a significantly smaller variance and are much closer to the origin.
We would like to automatically select the direction
To formalize this,
Given:- Dataset
Goal:- Project
Note that given a unit vector
$\rightarrow \text{Projection}{u}(x) = (x^Tu)u$
$\rightarrow \text{Mean of Projection}{u}(x) = (\bar{x}^Tu)u$
$\rightarrow \text{Variance of Projection}{u}(x) = \frac{1}{n} \sum{i=1}^n (x_i^Tu - \bar{x}^Tu)^2$
Digression:
$C = \frac{1}{n} \sum_{i=1}^n (x_i - \bar{x})(x_i - \bar{x})^T$ is known as the covariance matrix and$u$ is a constant variable.
Next Step, we'll try to maximize the variance of the projection.
$\begin{bmatrix}\text{Max} \space (u^TCu)\ \text{s.t} \space (u^Tu) = 1\end{bmatrix}$
When employing the Lagrange multiplier method and considering the objective function, the result is:
We left multiply,
In order to maximize the projection, we need to essentially find the greatest eigenvalue and the respective eigenvector.
Step 1: Compute the mean vector
Step 2: Compute the covariance matrix
Step 3: Find eigenvalues and eigenvectors.
Step 4: Select eigenvectors corresponding to the greatest eigenvalues k and derive the transformed data point on the projection.
$$\tilde{x}i = \sum{j=1}^k \alpha_j u_j, \text{where} \space \alpha_j = (x_i^Tu_j) \space \text{and} \space u_j = \text{eigenvector}$$
Step 5: Calculate the reconstruction error and projected variance as:
Note: Since the covariance matrix is symmetric, its eigenvector (or also known as principal component) is orthogonal to each other.