Created
February 23, 2017 11:46
-
-
Save kuchaale/ff328d1f9fc4c89f37215a407d9e2b5e to your computer and use it in GitHub Desktop.
Pearson correlation coefficient calculation according to Bretherton's effective sample size
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def autocorr(x, t=1): | |
"""Calculates autocorrelation with lag = 1. | |
Parameters | |
---------- | |
x: numpy.array | |
Input | |
Returns | |
------- | |
ac: numpy.float64 | |
Autocorrelation with lag = 1 | |
""" | |
ac = np.corrcoef(np.array([x[0:len(x)-t], x[t:len(x)]]))[0][1] | |
return ac | |
def betai(a, b, x): | |
""" | |
Returns the incomplete beta function. | |
I_x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt) | |
where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma | |
function of a. | |
The standard broadcasting rules apply to a, b, and x. | |
Parameters | |
---------- | |
a : array_like or float > 0 | |
b : array_like or float > 0 | |
x : array_like or float | |
x will be clipped to be no greater than 1.0 . | |
Returns | |
------- | |
betai : ndarray | |
Incomplete beta function. | |
References | |
---------- | |
https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/stats.py#L2392 | |
""" | |
x = np.asarray(x) | |
x = np.where(x < 1.0, x, 1.0) # if x > 1 then return 1.0 | |
return special.betainc(a, b, x) | |
def betai(x, df): | |
a = 0.5*df | |
b = 0.5 | |
x = np.asarray(x) | |
x = np.where(x < 1.0, x, 1.0) # ia x > 1 then return 1.0 | |
return special.betainc(a, b, x) | |
def pearsonr_moje(x, y): | |
"""Determines a Pearson correlation coefficient,the p-value and confidence | |
interval estimate according to an estimated effective sample size | |
from eq. 31 in (Bretherton et al, 1999). | |
Parameters | |
---------- | |
x: numpy.array | |
Input | |
y: numpy.array | |
Input | |
Returns | |
------- | |
r: numpy.float64 | |
prob: numpy.float64 | |
ci: numpy.float64 | |
Confidence interval estimate | |
References | |
---------- | |
http://journals.ametsoc.org/doi/abs/10.1175/1520-0442(1999)012%3C1990%3ATENOSD%3E2.0.CO%3B2 | |
""" | |
rx = autocorr(x, t=1) | |
ry = autocorr(y, t=1) | |
df = x.shape[0]*(1-rx*ry)/(1+rx*ry)-2 | |
r = np.corrcoef(x, y)[0][1] | |
t_squared = r*r * (df / ((1.0 - r) * (1.0 + r))) | |
prob = betai(0.5*df, 0.5, df / (df + t_squared)) | |
ci = 1.96/np.sqrt(df) | |
return r, prob, ci |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am currently applying an 11-year running (rolling) mean to two time series, each containing 45 time steps. After smoothing, I observed that the lag-1 autocorrelation is quite high—around 0.96.
Given this strong autocorrelation, I understand that using the usual confidence interval formula with sqrt(𝑑𝑓) in the denominator may no longer be appropriate.
I was wondering if it might be reasonable to consider using lag-4 autocorrelation instead, to better reflect the reduced degrees of freedom. However, I’m not entirely sure if this is the best approach.
If you have any suggestions or guidance on how to more appropriately handle this situation, I would be very grateful.