Last active
August 11, 2018 18:01
-
-
Save cdipaolo/2d452688ddd378d6382dd68218c6ffa7 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
def whiten(X): | |
'''whiten | |
Takes a data matrix X in R^{n\times p} and returns a matrix | |
Y with zero column mean and identity covariance. Assumes | |
your data has full column rank. For speed, if n is 10000 | |
and p is 400, this takes about 150ms, for example. | |
:param X: Data matrix where rows are samples and cols are features. | |
:type X: np.array[n,p] | |
:returns: Whitened data matrix. | |
:rtype: np.array[n,p] | |
>>> X = np.random.randn(100,5) | |
>>> np.allclose(np.cov(whiten(X),rowvar=False,bias=True), np.eye(5)) | |
True | |
''' | |
# subtract mean | |
Y = X - X.mean(axis=0) | |
L = np.linalg.cholesky(Y.T @ Y / Y.shape[0]) | |
L_inv = np.linalg.inv(L) | |
return Y @ L_inv.T |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This could probably be sped up by using a lower triangular solver instead of computing the inverse. See, eg., this scipy function.