Skip to content

Instantly share code, notes, and snippets.

@cdipaolo
Last active August 11, 2018 18:01
Show Gist options
  • Save cdipaolo/2d452688ddd378d6382dd68218c6ffa7 to your computer and use it in GitHub Desktop.
Save cdipaolo/2d452688ddd378d6382dd68218c6ffa7 to your computer and use it in GitHub Desktop.
import numpy as np
def whiten(X):
'''whiten
Takes a data matrix X in R^{n\times p} and returns a matrix
Y with zero column mean and identity covariance. Assumes
your data has full column rank. For speed, if n is 10000
and p is 400, this takes about 150ms, for example.
:param X: Data matrix where rows are samples and cols are features.
:type X: np.array[n,p]
:returns: Whitened data matrix.
:rtype: np.array[n,p]
>>> X = np.random.randn(100,5)
>>> np.allclose(np.cov(whiten(X),rowvar=False,bias=True), np.eye(5))
True
'''
# subtract mean
Y = X - X.mean(axis=0)
L = np.linalg.cholesky(Y.T @ Y / Y.shape[0])
L_inv = np.linalg.inv(L)
return Y @ L_inv.T
@cdipaolo
Copy link
Author

This could probably be sped up by using a lower triangular solver instead of computing the inverse. See, eg., this scipy function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment