Skip to content

Instantly share code, notes, and snippets.

@rabernat
Created June 1, 2016 12:00
Show Gist options
  • Save rabernat/bc4c6990eb20942246ce967e6c9c3dbe to your computer and use it in GitHub Desktop.
Save rabernat/bc4c6990eb20942246ce967e6c9c3dbe to your computer and use it in GitHub Desktop.
import xarray as xr
import numpy as np
# create an example dataset
da = xr.DataArray(np.random.rand(10,30,40), dims=['time', 'lat', 'lon'])
# define a function to compute a linear trend of a timeseries
def linear_trend(x):
pf = np.polyfit(x.time, x, 1)
# we need to return a dataarray or else xarray's groupby won't be happy
return xr.DataArray(pf[0])
# stack lat and lon into a single dimension called allpoints
stacked = da.stack(allpoints=['lat','lon'])
# apply the function over allpoints to calculate the trend at each point
trend = stacked.groupby('allpoints').apply(linear_trend)
# unstack back to lat lon coordinates
trend_unstacked = trend.unstack('allpoints')
@sisams
Copy link

sisams commented Aug 30, 2020

Hi!
I'm trying to understand how .apply works for groupby.
This code works fine for Dataarray with no missing values/ few missing values per grid (I just added a line inside the function to drop times with missing values). However, I have a Dataarray [time,lat,lon] where some lat/lon don't have any finite data at all (np.nan). When I try to drop such lat/lon and return a xr.DataArray(np.nan), I get a 'SVD did not converge in Linear Least Squares.' I did the same with dataarray's with only few missing values per grid and it worked fine. How does dropna work by groupby groups with no valid data?

def linear_trend(x):
x = x.dropna(dim='time')
time = np.arange(len(x))
pf = np.polyfit(time, x, 1)
return xr.DataArray(pf[0])

@gopsur
Copy link

gopsur commented Oct 6, 2021

I am calculating r_value(correlation coefficient) through the time axis between a 3D array vs a 1D array.

my 3d array in the shape (240 time, 180longitude, 37 latitude) and my 1 d array in the shape (240 time)

i want to iterate and calculate the r_value for a range of time series,

say calculate r_value for first 60 in the time series, followed by first 61 in the time series,first 62 in the time series...........up to first 240 in the time series

ie t1-t60[3D array vs 1D array], t1-t61[3D array vs 1D array], t1-t62[3D array vs 1D array].................. t1-t240[3D array vs 1D array]

save all these values in one array which also contain latitude and longitude

i want to keep my longitude and latitude unchanged(means i want to calculate this for every single grid points)

finally i want to get a array which have a shape like (all series rvalue, all longitude, all latitude)

Is there any way to do this?


#
 for k in range (60,241):
        std_norm=np.std(one_point_ds2[0:k])  #calculate std deviation
        one_model_ds2_time=one_point_ds2[0:k]/std_norm   #normalizing
        one_model_ds2_new=one_model_ds2_time[0:k]/k   #dividing by number of timestep
        def corr_coeff(one_model):
             r = scipy.stats.linregress(one_model_ds2_new, one_model[0:k], alternative='two-sided') #finding r value
             return xr.DataArray(r) #returning the value
             stacked = one_model.stack(allpoints=['NAV_LAT','NAV_LON']) 
             trend = stacked.groupby('allpoints').apply(linear_trend)
             trend_unstacked = trend.unstack('allpoints') #applying to all points
             np.append(trend_unstacked,trend_unstacked) #trying to save the values in one file !!!but not working


@khandokershanto
Copy link

#Just a little add for netcdf climate dataset
def linear_trend(x):
date = x.time
ndate = np.arange(len(date))
pf = np.polyfit(ndate, x, 1)
# we need to return a dataarray or else xarray's groupby won't be happy
return xr.DataArray(pf[0])

@sudhansu-s-rath
Copy link

How the xarray.polyfit different from this ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment