Skip to content

Instantly share code, notes, and snippets.

@kuchaale
Created February 23, 2017 11:46
Show Gist options
  • Save kuchaale/ff328d1f9fc4c89f37215a407d9e2b5e to your computer and use it in GitHub Desktop.
Save kuchaale/ff328d1f9fc4c89f37215a407d9e2b5e to your computer and use it in GitHub Desktop.
Pearson correlation coefficient calculation according to Bretherton's effective sample size
def autocorr(x, t=1):
"""Calculates autocorrelation with lag = 1.
Parameters
----------
x: numpy.array
Input
Returns
-------
ac: numpy.float64
Autocorrelation with lag = 1
"""
ac = np.corrcoef(np.array([x[0:len(x)-t], x[t:len(x)]]))[0][1]
return ac
def betai(a, b, x):
"""
Returns the incomplete beta function.
I_x(a,b) = 1/B(a,b)*(Integral(0,x) of t^(a-1)(1-t)^(b-1) dt)
where a,b>0 and B(a,b) = G(a)*G(b)/(G(a+b)) where G(a) is the gamma
function of a.
The standard broadcasting rules apply to a, b, and x.
Parameters
----------
a : array_like or float > 0
b : array_like or float > 0
x : array_like or float
x will be clipped to be no greater than 1.0 .
Returns
-------
betai : ndarray
Incomplete beta function.
References
----------
https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/stats.py#L2392
"""
x = np.asarray(x)
x = np.where(x < 1.0, x, 1.0) # if x > 1 then return 1.0
return special.betainc(a, b, x)
def betai(x, df):
a = 0.5*df
b = 0.5
x = np.asarray(x)
x = np.where(x < 1.0, x, 1.0) # ia x > 1 then return 1.0
return special.betainc(a, b, x)
def pearsonr_moje(x, y):
"""Determines a Pearson correlation coefficient,the p-value and confidence
interval estimate according to an estimated effective sample size
from eq. 31 in (Bretherton et al, 1999).
Parameters
----------
x: numpy.array
Input
y: numpy.array
Input
Returns
-------
r: numpy.float64
prob: numpy.float64
ci: numpy.float64
Confidence interval estimate
References
----------
http://journals.ametsoc.org/doi/abs/10.1175/1520-0442(1999)012%3C1990%3ATENOSD%3E2.0.CO%3B2
"""
rx = autocorr(x, t=1)
ry = autocorr(y, t=1)
df = x.shape[0]*(1-rx*ry)/(1+rx*ry)-2
r = np.corrcoef(x, y)[0][1]
t_squared = r*r * (df / ((1.0 - r) * (1.0 + r)))
prob = betai(0.5*df, 0.5, df / (df + t_squared))
ci = 1.96/np.sqrt(df)
return r, prob, ci
@amolvibhuteIITM
Copy link

I am currently applying an 11-year running (rolling) mean to two time series, each containing 45 time steps. After smoothing, I observed that the lag-1 autocorrelation is quite high—around 0.96.

Given this strong autocorrelation, I understand that using the usual confidence interval formula with sqrt(𝑑𝑓) in the denominator may no longer be appropriate.

I was wondering if it might be reasonable to consider using lag-4 autocorrelation instead, to better reflect the reduced degrees of freedom. However, I’m not entirely sure if this is the best approach.

If you have any suggestions or guidance on how to more appropriately handle this situation, I would be very grateful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment