Last active
March 21, 2024 09:00
-
-
Save fabianp/9396204419c7b638d38f to your computer and use it in GitHub Desktop.
Partial Correlation in Python (clone of Matlab's partialcorr)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Partial Correlation in Python (clone of Matlab's partialcorr) | |
This uses the linear regression approach to compute the partial | |
correlation (might be slow for a huge number of variables). The | |
algorithm is detailed here: | |
http://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression | |
Taking X and Y two variables of interest and Z the matrix with all the variable minus {X, Y}, | |
the algorithm can be summarized as | |
1) perform a normal linear least-squares regression with X as the target and Z as the predictor | |
2) calculate the residuals in Step #1 | |
3) perform a normal linear least-squares regression with Y as the target and Z as the predictor | |
4) calculate the residuals in Step #3 | |
5) calculate the correlation coefficient between the residuals from Steps #2 and #4; | |
The result is the partial correlation between X and Y while controlling for the effect of Z | |
Date: Nov 2014 | |
Author: Fabian Pedregosa-Izquierdo, [email protected] | |
Testing: Valentina Borghesani, [email protected] | |
""" | |
import numpy as np | |
from scipy import stats, linalg | |
def partial_corr(C): | |
""" | |
Returns the sample linear partial correlation coefficients between pairs of variables in C, controlling | |
for the remaining variables in C. | |
Parameters | |
---------- | |
C : array-like, shape (n, p) | |
Array with the different variables. Each column of C is taken as a variable | |
Returns | |
------- | |
P : array-like, shape (p, p) | |
P[i, j] contains the partial correlation of C[:, i] and C[:, j] controlling | |
for the remaining variables in C. | |
""" | |
C = np.asarray(C) | |
p = C.shape[1] | |
P_corr = np.zeros((p, p), dtype=np.float) | |
for i in range(p): | |
P_corr[i, i] = 1 | |
for j in range(i+1, p): | |
idx = np.ones(p, dtype=np.bool) | |
idx[i] = False | |
idx[j] = False | |
beta_i = linalg.lstsq(C[:, idx], C[:, j])[0] | |
beta_j = linalg.lstsq(C[:, idx], C[:, i])[0] | |
res_j = C[:, j] - C[:, idx].dot( beta_i) | |
res_i = C[:, i] - C[:, idx].dot(beta_j) | |
corr = stats.pearsonr(res_i, res_j)[0] | |
P_corr[i, j] = corr | |
P_corr[j, i] = corr | |
return P_corr |
there is a mistake in line 61 and 62, the subscripts of the betas are incorrect
@rubenSaro I guess it is correct since it is consistent with line 58 and 59. Although the subscripts are actually confusion.
Why not just -np.linalg.inv(np.corrcoef(C.T))
? except if you really need the linear regression method.
In [24]: C = np.random.normal(0,1,(1000,5))
In [25]: partial_corr(C)
Out[25]:
array([[ 1. , 0.04377477, 0.05926928, -0.0048639 , -0.00949965],
[ 0.04377477, 1. , -0.02458582, -0.00286263, 0.00101031],
[ 0.05926928, -0.02458582, 1. , 0.00670762, -0.04408118],
[-0.0048639 , -0.00286263, 0.00670762, 1. , 0.02981604],
[-0.00949965, 0.00101031, -0.04408118, 0.02981604, 1. ]])
In [26]: -np.linalg.inv(np.corrcoef(C.T))
Out[26]:
array([[-1.00550451, 0.04393255, 0.05962884, -0.00486145, -0.009527 ],
[ 0.04393255, -1.0024175 , -0.02466733, -0.00284463, 0.00102981],
[ 0.05962884, -0.02466733, -1.0060574 , 0.0067103 , -0.04429945],
[-0.00486145, -0.00284463, 0.0067103 , -1.00095072, 0.0298542 ],
[-0.009527 , 0.00102981, -0.04429945, 0.0298542 , -1.00297644]])
Why not just
-np.linalg.inv(np.corrcoef(C.T))
? except if you really need the linear regression method.In [24]: C = np.random.normal(0,1,(1000,5)) In [25]: partial_corr(C) Out[25]: array([[ 1. , 0.04377477, 0.05926928, -0.0048639 , -0.00949965], [ 0.04377477, 1. , -0.02458582, -0.00286263, 0.00101031], [ 0.05926928, -0.02458582, 1. , 0.00670762, -0.04408118], [-0.0048639 , -0.00286263, 0.00670762, 1. , 0.02981604], [-0.00949965, 0.00101031, -0.04408118, 0.02981604, 1. ]]) In [26]: -np.linalg.inv(np.corrcoef(C.T)) Out[26]: array([[-1.00550451, 0.04393255, 0.05962884, -0.00486145, -0.009527 ], [ 0.04393255, -1.0024175 , -0.02466733, -0.00284463, 0.00102981], [ 0.05962884, -0.02466733, -1.0060574 , 0.0067103 , -0.04429945], [-0.00486145, -0.00284463, 0.0067103 , -1.00095072, 0.0298542 ], [-0.009527 , 0.00102981, -0.04429945, 0.0298542 , -1.00297644]])
In your opinion what do partial coefficients bigger than 1 in absolute signify?
If you do this, it works:
def pcor(data):
X = -np.linalg.inv(np.cov(data.T))
stdev = np.sqrt(np.abs(np.diag(X)))
X /= stdev[:, None]
X /= stdev[None, :]
np.fill_diagonal(X, 1.0)
return X
For the original data this gives:
[[ 1. -0.54341003 -0.14076948]
[-0.54341003 1. -0.76207595]
[-0.14076948 -0.76207595 1. ]]
C = np.asarray(C)
p = C.shape[1]
P_corr = np.zeros((p, p), dtype=np.float) # sample linear partial correlation coefficients
corr = np.corrcoef(C,rowvar=False) # Pearson product-moment correlation coefficients.
corr_inv = inv(corr) # the (multiplicative) inverse of a matrix.
for i in range(p):
P_corr[i, i] = 1
for j in range(i+1, p):
pcorr_ij = -corr_inv[i,j]/(np.sqrt(corr_inv[i,i]*corr_inv[j,j]))
# idx = np.ones(p, dtype=np.bool)
# idx[i] = False
# idx[j] = False
# beta_i = linalg.lstsq(C[:, idx], C[:, j])[0]
# beta_j = linalg.lstsq(C[:, idx], C[:, i])[0]
# res_j = C[:, j] - C[:, idx].dot( beta_i)
# res_i = C[:, i] - C[:, idx].dot(beta_j)
# corr = stats.pearsonr(res_i, res_j)[0]
# P_corr[i, j] = corr
# P_corr[j, i] = corr
P_corr[i,j]=pcorr_ij
P_corr[j,i]=pcorr_ij
return P_corr
by this you can get the result same as in R 'pcor'
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
maybe they should add this to the scipy library