Last active
February 26, 2020 21:25
-
-
Save kirarpit/9c454a76d94d2860a9ba6bc222239cba to your computer and use it in GitHub Desktop.
Reproduction of NumPy's covariance and correlation matrices from scratch in Python
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
def covariance(X): | |
""" | |
Computes the covariance matrix of X | |
:param X: Input data of shape (N, M) where N is the | |
number of samples and M is the number of | |
features | |
:return: Output data of shape (M, M) where Output[i][j] | |
is Cov(feature_i, feature_j) | |
""" | |
mean = np.mean(X, axis=0) | |
X_zero_centered = X - mean | |
return X_zero_centered.T.dot(X_zero_centered)/(len(X) - 1) # Note: there is one less degree of freedom | |
def correlation(X): | |
""" | |
Computes the correlation matrix of X | |
:param X: Input data of shape (N, M) where N is the | |
number of samples and M is the number of | |
features | |
:return: Output data of shape (M, M) where Output[i][j] | |
is Corr(feature_i, feature_j) | |
""" | |
mean = np.mean(X, axis=0) | |
std = np.std(X, axis=0) | |
X_zero_centered = X - mean | |
cov = X_zero_centered.T.dot(X_zero_centered)/len(X) # Note: here it's divided by len(X) instead of len(X) - 1 | |
return (cov/std)/std.reshape((-1, 1)) | |
X =[[0.2, 0.3, 1. ], | |
[0. , 0.6, 1. ], | |
[0.6, 0. , 3. ], | |
[0.2, 0.1, 1. ]] | |
X = np.array(X) | |
assert (np.cov(X.T) - covariance(X) < 1e-15).all() | |
assert (np.corrcoef(X.T) - correlation(X) < 1e-15).all() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
A few things to note,
f(X) == f(X - np.mean(X, axis=0))
where f could be both, covariance and correlation.covariance(X_std) == correlation(X)
whereX_std = (X-np.mean(X, axis=0))/X.std(axis=0)
. However, since covariance useslen(X) - 1
instead oflen(X)
the results are slightly off.