Skip to content

Instantly share code, notes, and snippets.

@BastinRobin
Created March 10, 2021 08:36
Show Gist options
  • Save BastinRobin/52930a910e8b86ef4751ddf84a43bcd9 to your computer and use it in GitHub Desktop.
Save BastinRobin/52930a910e8b86ef4751ddf84a43bcd9 to your computer and use it in GitHub Desktop.
PCA

PCA

There is no pca() function in NumPy, but we can easily calculate the Principal Component Analysis step-by-step using NumPy functions. The example below defines a small 3×2 matrix, centers the data in the matrix, calculates the covariance matrix of the centered data, and then the eigendecomposition of the covariance matrix. The eigenvectors and eigenvalues are taken as the principal components and singular values and used to project the original data.

from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig

# define a matrix
A = array([[1, 2], [3, 4], [5, 6]])
print(A)
# calculate the mean of each column
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment