Skip to content

Instantly share code, notes, and snippets.

@rcdilorenzo
Last active April 18, 2022 15:40
Show Gist options
  • Save rcdilorenzo/f832f0c653ad4e8eaa91fc7852e51e5c to your computer and use it in GitHub Desktop.
Save rcdilorenzo/f832f0c653ad4e8eaa91fc7852e51e5c to your computer and use it in GitHub Desktop.
How to calculate a covariance matrix
import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
def covariance(M: np.ndarray) -> np.ndarray:
"""
Compute sample covariance matrix from data M.
NOTE: M is assumed to be of shape (nrows, ncols).
B = M - mean(M) (by column)
covariance = B^T × B / (N - 1)
"""
N = M.shape[0]
B = M - np.mean(M, axis=0)
return B.T @ B / (N - 1)
# Load matrix of iris features
V = load_iris()["data"]
# Get sklearn covariance implementation
pca = PCA()
pca.fit(V)
# Assert matches implementation from sklearn
assert np.allclose(pca.get_covariance(), covariance(V))
# Assert matches implementation from numpy.cov
assert np.allclose(np.cov(V.T), covariance(V))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment