Last active
June 22, 2020 18:45
-
-
Save KMarkert/16902163c52e587436e473587b587f52 to your computer and use it in GitHub Desktop.
smap_l2_9km_gridding.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "smap_l2_9km_gridding.ipynb", | |
"provenance": [], | |
"private_outputs": true, | |
"collapsed_sections": [], | |
"toc_visible": true, | |
"authorship_tag": "ABX9TyN3lorlzs3LpQwl35Nenbie", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/KMarkert/16902163c52e587436e473587b587f52/smap_l2_9km_gridding.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "ASo2MzXPgJsn", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"## Gridding SMAP level 2 Soil Moisture data\n", | |
"\n", | |
"This notebook example shows how one might access the 9km SMAP level 2 data, read the dataset, grid the data to a common projection, and save the data as a time series dataset.\n", | |
"\n", | |
"Total time ~10 minutes" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "ioqeIlbzzwLE", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# install a package used to handle projections\n", | |
"!pip install pyproj h5netcdf --quiet" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "jvJ4RZoqq40h", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# import packages\n", | |
"\n", | |
"%pylab inline\n", | |
"\n", | |
"import os\n", | |
"import glob\n", | |
"import h5py\n", | |
"import datetime\n", | |
"import xarray as xr\n", | |
"from pyproj import Transformer\n", | |
"from scipy.interpolate import griddata" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "2KzAu9sGLZUP", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# list of urls to access data\n", | |
"# this list can be created from search.earthdata.nasa.gov\n", | |
"\n", | |
"urls = [\n", | |
" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01056_D_20150413T184423_R13080_001.h5',\n", | |
" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01057_D_20150413T202249_R13080_001.h5',\n", | |
" 'https://n5eil01u.ecs.nsidc.org/SMAP/SPL2SMAP.003/2015.04.13/SMAP_L2_SM_AP_01058_D_20150413T220114_R13080_001.h5'\n", | |
"]" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "23zUA5OcPU_K", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# download each url locally ... very simple fetching\n", | |
"# change the username and password to your NASA EarthData credentials\n", | |
"for url in urls:\n", | |
" os.system(f'wget --http-user=<EARTHDATA_USER> --http-password=<EARTHDATA_PASS> {url}')" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "vOucoQhcPecZ", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# list the files downloaded\n", | |
"h5Files = glob.glob('./*.h5')" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "uCl4N1VJQlLJ", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# open up one dataset and see what is going on insided\n", | |
"ds = h5py.File(h5Files[0],'r')\n", | |
"list(ds.keys())" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "nvyXZ96CQzGB", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# looks like there are a few groups so we will see what is in the 'Soil_Moisture_Retrieval_Data'\n", | |
"sub = ds['Soil_Moisture_Retrieval_Data']\n", | |
"list(sub.keys())\n" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "xSILptjumCsC", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"ds.close() # close file...don't want to leave it open" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "5RYMkp6whWd5", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"Lots of variables...we will focus on the geographic variables and `soil_moisture`. The following workflow will work for all variables but will need to be reworked to grid all variables and save as a dataset" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "eOfIFhoshtkE", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"### Gridding process\n", | |
"\n", | |
"The SMAP data is a 1-d array of values with an associated geographic coordinate. This format isn't very friendly for GIS applications so we are going to take the 1-d structure and convert it into gridded raster dataset. Here is the overall workflow for the gridding:\n", | |
"1. Define projection information\n", | |
"2. Extract the data for each file\n", | |
"3. Grid the data to the native projection\n", | |
"4. Merge multiple timesteps to one 3-d dataset (y,x,t)\n", | |
"5. Save the data to a more friendly format" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "uhu80dH1anFl", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# here we define the projection information\n", | |
"# the SMAP data's default projection is EASE2.0 (EPSG:6993)\n", | |
"# Luckily this is a nice projection to work with and the projection col,row info is defined in the SMAP data\n", | |
"\n", | |
"\n", | |
"# we will define the under lying grid to project the 1-d data to\n", | |
"# information taken from https://nsidc.org/ease/ease-grid-projection-gt\n", | |
"projOrigin = (-17367530.45,7314540.83) # x,y\n", | |
"projDims = (3856,1624) # cols, rows\n", | |
"projRes = (9008.05, 9008.05) # xResolution, yResolution\n", | |
"\n", | |
"# create arrays of the row,col information\n", | |
"col = np.arange(0,projDims[0],1).astype(np.int16)\n", | |
"row = np.arange(0,projDims[1],1).astype(np.int16)[::-1]\n", | |
"\n", | |
"# make a 2-d array of the column and row indices\n", | |
"xx,yy = np.meshgrid(col,row)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "PEDLkG3RZRkF", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# here is where we loop over the files read the data and perform the gridding\n", | |
"\n", | |
"# define empty lists to dump the gridded data and time info to\n", | |
"data = []\n", | |
"times = []\n", | |
"refTime = datetime.datetime(2000,1,1,0,0,0)\n", | |
"# loop over the files\n", | |
"for f in h5Files:\n", | |
" # open dataset and access the specific data we want\n", | |
" ds = h5py.File(f,'r')\n", | |
" sub = ds['Soil_Moisture_Retrieval_Data']\n", | |
" yidx = np.array(sub['EASE_row_index']) # get the row info\n", | |
" xidx = np.array(sub['EASE_column_index']) # get the col info\n", | |
" sm = np.array(sub['soil_moisture']) # get the soil moisture array\n", | |
" # change the sm variable access to a different variable from the earlier list if you would like\n", | |
" ds.close() # close file\n", | |
"\n", | |
" # perfrom the gridding based on the EASE col and row\n", | |
" smGridded = griddata(np.stack([yidx,xidx],axis=-1), sm, (yy,xx),fill_value=-9999.0,method='linear')\n", | |
"\n", | |
" # get the time infomation as string from the filename\n", | |
" t = datetime.datetime.strptime(f.split('_')[-3], '%Y%m%dT%H%M%S' )# convert string to datetime info while appending\n", | |
" \n", | |
" data.append(smGridded)\n", | |
" times.append((t-refTime).total_seconds()) " | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "JNT37rcpbiq7", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# the previous geographic information we used was a column or row index from the EASE2.0 projection\n", | |
"# we actually want the lat/lon coordinates from the EASE2.0 projection in the final file\n", | |
"# here we create the EASE coordinates as numpy arrays and grids\n", | |
"projX = np.arange(projOrigin[0],projOrigin[0]+((projDims[0])*projRes[0]),projRes[0]).astype(np.float32)\n", | |
"projY = np.arange(projOrigin[1]-((projDims[1])*projRes[1]),projOrigin[1],projRes[1]).astype(np.float32)\n", | |
"\n", | |
"xxProj,yyProj = np.meshgrid(projX,projY)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "BSoWRaqjbms7", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# convert the projection grid to lat/lon coordinates\n", | |
"# using proj to reproject EASE coordinates to lat/lon\n", | |
"transformer = Transformer.from_crs(\"epsg:6933\", \"epsg:4326\",always_xy=True)\n", | |
"projLon,projLat = transformer.transform(xxProj,yyProj)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "KCXD6aZaAlOW", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"imshow(projLat)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "X4FerpqEAnh2", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"all(projLon[0,:] == projLon[700,:])" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "ixDRtlCEEWA-", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# here we prep our data a little further before creating a 3-d dataset\n", | |
"\n", | |
"# we stack the 2-d soil moisture data along a third axis here using np.dstack (third dim is time)\n", | |
"data3D = np.dstack(data)\n", | |
"\n", | |
"# we extract out the gridded lat/long data to a 1-d array\n", | |
"# we can do this because the lat/long values are the same across the globe based on the projection\n", | |
"# this is a simple step that can save valuable storage space\n", | |
"xcoord = projLon[0,:].astype(np.float32)\n", | |
"ycoord = projLat[:,0].astype(np.float32)\n", | |
"times = np.array(times,dtype=np.int32)" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "OReG8W3TbuO_", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# now that we have all of the data, we just need to combine it\n", | |
"# we are going to use xarray (my personal favorite for handling this kind of data)\n", | |
"# we specify the 3-d data and the coordinates\n", | |
"\n", | |
"ds = xr.Dataset({'soil_moisture': (['latitude', 'longitude', 'time'], data3D )},\n", | |
" coords={'latitude': ycoord,\n", | |
" 'longitude': xcoord,\n", | |
" 'time': times})\n", | |
"\n", | |
"# # mask no data information - valid range of soil moisture data from SMAP is 0-1\n", | |
"ds = ds.where((ds >= 0) & (ds <= 1)).astype('float32')" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "-3ZcxAKHKsuW", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"ds.soil_moisture.attrs = {'long_name':'Representative soil moisture measurement for the Earth based grid cell','units':'cm**3/cm**3'}\n", | |
"ds.latitude.attrs = {'long_name':'Latitude','units':'degrees_north'}\n", | |
"ds.longitude.attrs = {'long_name':'Longitude','units':'degrees_east'}\n", | |
"ds.time.attrs = {'calendar':'proleptic_gregorian','units':'seconds since 2000-01-01'}" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "paPGdzeSdJTI", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# check what is going on in the xarray dataset\n", | |
"ds" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "2uxyE1kKdUCn", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# plot the three different time steps\n", | |
"ds.soil_moisture.plot(col='time',figsize=(15,5))\n", | |
"plt.show()" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "SObWh92pC7Dt", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# before we save out the file we are going to \"chunk\" the dataset\n", | |
"# chunking breaks the data along a dimension into smaller pieces\n", | |
"# this increases read time but can *significantly* reduce file sizes\n", | |
"# we will chunk along lat/long so that internally we have 256x256 tiles\n", | |
"ds = ds.chunk({'latitude': 256, 'longitude': 256,'time':1})" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "C6ppHrr4Lst9", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# the final thing we do (I know there has been a lot of data wrangling...)\n", | |
"# is to sort by time (sometimes it becomes unsorted when saving to disk)\n", | |
"ds = ds.sortby('time')" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "God5m-ahdbpi", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# save the dataset as a gridded netcdf file\n", | |
"# we provide some encoding information to help with compression too\n", | |
"ds.to_netcdf('gridded_smap.nc',engine='h5netcdf',\n", | |
" encoding={'soil_moisture':{'_FillValue':-9999, # data encoding\n", | |
" 'zlib':True,'complevel':9,'shuffle':True} # compression encoding\n", | |
" })" | |
], | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "KoV63z4mkQM3", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"So we accessed the SMAP data, performed some gridding and saved the file to a more GIS friendly format. Ideally, we would carry over the metadata from the HDF5 file, and use more variables from the original file but this example is a start for converting the data." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "4BzTeoQMOG-x", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"" | |
], | |
"execution_count": null, | |
"outputs": [] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment