Skip to content

Instantly share code, notes, and snippets.

@BastinRobin
Last active March 10, 2021 08:36
Show Gist options
  • Save BastinRobin/8a317a7e891a106801d2f754fb57fe7a to your computer and use it in GitHub Desktop.
Save BastinRobin/8a317a7e891a106801d2f754fb57fe7a to your computer and use it in GitHub Desktop.
Manually Calculate Principal Component Analysis
## PCA
There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions.
The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig
# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# calculate the mean of each column
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment