Last active
March 10, 2021 08:36
-
-
Save BastinRobin/8a317a7e891a106801d2f754fb57fe7a to your computer and use it in GitHub Desktop.
Manually Calculate Principal Component Analysis
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## PCA | |
There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions. | |
The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data. | |
from numpy import array | |
from numpy import mean | |
from numpy import cov | |
from numpy.linalg import eig | |
# define a matrix | |
A = array([[1, 2], [3, 4], [5, 6]]) | |
print(A) | |
# calculate the mean of each column | |
M = mean(A.T, axis=1) | |
print(M) | |
# center columns by subtracting column means | |
C = A - M | |
print(C) | |
# calculate covariance matrix of centered matrix | |
V = cov(C.T) | |
print(V) | |
# eigendecomposition of covariance matrix | |
values, vectors = eig(V) | |
print(vectors) | |
print(values) | |
# project data | |
P = vectors.T.dot(C.T) | |
print(P.T) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment