Last active
October 25, 2021 17:47
-
-
Save TomAugspurger/bacdce8d9cf4ac209d899ea9e6845361 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "a588e7f9-5f3a-43b4-b31a-6171feef0690", | |
"metadata": {}, | |
"source": [ | |
"# Data Access with STAC\n", | |
"\n", | |
"This notebook demonstrates using the Planetary Computer's STAC API to find and load data of interest." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "91efaeaf-14f7-47f9-8e42-101572480215", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from dask.distributed import Client\n", | |
"\n", | |
"client = Client()\n", | |
"client" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "fe2289ea-fd87-4242-bd81-9eec607a9917", | |
"metadata": {}, | |
"source": [ | |
"## Remote sensing\n", | |
"\n", | |
"This demonstrates the motivating use-case for STAC. Given a massive collection of COGs, for for example, [Sentinel-2 Level-2A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a), find just the ones that match some particular conditions." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "1d94a8e7-ba8d-47e1-99ce-01fbf3b95a1b", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import pystac_client\n", | |
"\n", | |
"catalog = pystac_client.Client.open(\n", | |
" \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n", | |
")\n", | |
"\n", | |
"search = catalog.search(\n", | |
" collections=[\"sentinel-2-l2a\"],\n", | |
" intersects={\n", | |
" \"type\": \"Polygon\",\n", | |
" \"coordinates\": [\n", | |
" [\n", | |
" [-148.56536865234375, 60.80072385643073],\n", | |
" [-147.44338989257812, 60.80072385643073],\n", | |
" [-147.44338989257812, 61.18363894915102],\n", | |
" [-148.56536865234375, 61.18363894915102],\n", | |
" [-148.56536865234375, 60.80072385643073],\n", | |
" ]\n", | |
" ],\n", | |
" },\n", | |
" datetime=\"2019-06-01/2019-08-01\",\n", | |
" query={\"eo:cloud_cover\": {\"lt\": 10}}\n", | |
")\n", | |
"\n", | |
"%time items = list(search.get_all_items())" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "e2eab338-3a6f-4d67-8664-d6e7548f783c", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"len(items)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ce74b616-1411-4e8c-ba7c-dd2698e8b784", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import geopandas\n", | |
"from folium.features import GeoJson\n", | |
"\n", | |
"gdf = geopandas.GeoDataFrame.from_features(\n", | |
" [item.to_dict() for item in items],\n", | |
" columns=[\"geometry\", \"datetime\", \"eo:cloud_cover\",],\n", | |
").set_crs(4326)\n", | |
"gdf" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "de70fd08-f9c7-4fb7-8812-7902d25c7d98", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"m = gdf.explore()\n", | |
"\n", | |
"layer = GeoJson(\n", | |
" data=search.get_parameters()[\"intersects\"],\n", | |
" style_function=lambda x: {\"color\": \"yellow\"}\n", | |
")\n", | |
"layer.add_to(m)\n", | |
"m" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ee6ab434-768f-4ab7-a7a5-6f1126658c85", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from pystac.extensions.eo import EOExtension as eo\n", | |
"least_cloudy_item = min(items, key=lambda item: eo.ext(item).cloud_cover)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "d4f4618d-2114-4fc2-a615-39020abcf0a9", | |
"metadata": {}, | |
"source": [ | |
"The Planetary Computer requires that you \"sign\" your items / assets before opening them." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "e3c0411d-4741-43f3-b793-d483e524f800", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import requests\n", | |
"\n", | |
"r = requests.get(least_cloudy_item.assets[\"B02\"].href)\n", | |
"r.status_code" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "faf479a6-3e70-4e3d-92b1-3f6de31082f7", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import planetary_computer\n", | |
"\n", | |
"signed_item = planetary_computer.sign(least_cloudy_item)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "c3f5b230-330c-409f-960b-4077fbd2fbaf", | |
"metadata": {}, | |
"source": [ | |
"Signing an item appends a short-lived, read-only SAS token to each asset's URL. This helps us keep a handle on what data is being used, and most importantly keep a handle on our egress costs.\n", | |
"\n", | |
"You can sign assets anonymously, so you don't need a Planetary Computer account to use the data (or STAC API). " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "4eb0ba13-8b5e-4712-a4f1-d0753673fa03", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import stackstac\n", | |
"\n", | |
"da = (\n", | |
" stackstac.stack(\n", | |
" signed_item.to_dict(),\n", | |
" assets=[\"B02\", \"B03\", \"B04\", \"B08\"]\n", | |
" )\n", | |
" .where(lambda x: x > 0)\n", | |
" .squeeze()\n", | |
")\n", | |
"da" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "2c37e83d-c451-4e47-bbee-a7cc2c5b30ac", | |
"metadata": {}, | |
"source": [ | |
"Now we can use xarray's nice high-level APIs for analysis (e.g. band math, to compute [NDVI](https://www.usgs.gov/core-science-systems/eros/phenology/science/ndvi-foundation-remote-sensing-phenology?qt-science_center_objects=0#))." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "f8a94846-88bf-4834-ab03-eb19c5f95b9c", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"red = da.sel(band=\"B04\")\n", | |
"nir = da.sel(band=\"B08\")\n", | |
"\n", | |
"ndvi = (nir - red) / (nir + red)\n", | |
"ndvi" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "7068913d-4f22-4839-9373-484c393f0706", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"ndvi = ndvi.persist()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "72d7e5b3-2387-4d4d-90ef-467d79d0b3bf", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from ipyleaflet import FullScreenControl\n", | |
"\n", | |
"m = stackstac.show(ndvi, range=[-0.6, 0.6])\n", | |
"m.scroll_wheel_zoom = True\n", | |
"m.add_control(FullScreenControl())\n", | |
"m" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "87cdb8ba-97d8-4152-bd4e-0eaee331b3bc", | |
"metadata": {}, | |
"source": [ | |
"This approach is especially useful when building a datacube with multiple time-steps and multiple scenes in the `x` or `y` direction. `stacstack` knows how to arrange these items in space and time." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "d860f15c-cad7-401f-a688-0ddf8e04d824", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"search = catalog.search(\n", | |
" bbox=[-122.20, 47.50, -121.99, 47.80],\n", | |
" datetime=\"2016-01-01/2020-12-31\",\n", | |
" collections=[\"sentinel-2-l2a\"],\n", | |
" limit=500, # fetch items in batches of 500\n", | |
" query={\"eo:cloud_cover\": {\"lt\": 25}},\n", | |
")\n", | |
"\n", | |
"items = list(search.get_items())\n", | |
"signed_items = [planetary_computer.sign(item).to_dict() for item in items]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "abcc3574-f7dd-48bb-bd60-436f708dd506", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"len(items)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "d52a6b7f-091a-4c51-9ad3-9a48b204555f", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"data = (\n", | |
" stackstac.stack(\n", | |
" signed_items,\n", | |
" assets=[\"B08\", \"B04\", \"B03\", \"B02\"],\n", | |
" chunksize=4096,\n", | |
" )\n", | |
" .where(lambda x: x > 0)\n", | |
" .assign_coords(band=lambda x: x.common_name.rename(\"band\")) # use common names\n", | |
")\n", | |
"data" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "050098bd-fa01-4ca3-b3fe-eee60e287dde", | |
"metadata": {}, | |
"source": [ | |
"This is relying on the [`proj`](https://stac-extensions.github.io/) extension, which collects all the metadata necessary to know where in the DataArray each COG's data falls, without having to open any files.\n", | |
"\n", | |
"See also [odc-stac](https://odc-stac.readthedocs.io/en/latest/), which also uses STAC metadata to construct xarray DataArrays.\n", | |
"\n", | |
"## Earth systems data\n", | |
"\n", | |
"Now we'll work with some Earth Systems Science data using STAC. These datasets are typically higher-dimensional, and are stored as NetCDF or Zarr. We can still use STAC for data access.\n", | |
"\n", | |
"For example, terraclimate: https://planetarycomputer.microsoft.com/dataset/terraclimate" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "c648f4c4-d1af-403e-a6c1-9c57c6586465", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import pystac\n", | |
"\n", | |
"url = \"https://planetarycomputer.microsoft.com/api/stac/v1/collections/terraclimate\"\n", | |
"collection = pystac.read_file(url)\n", | |
"collection" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "bbc09391-523a-4fed-9789-22aba6f448a8", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"asset = collection.assets[\"zarr-https\"]\n", | |
"asset" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "bded90b4-72b4-498d-984e-fc722e38e4e9", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"asset.extra_fields" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "94b5d390-b6ea-45d4-b424-a092725b3aff", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import fsspec\n", | |
"import xarray as xr\n", | |
"\n", | |
"store = fsspec.get_mapper(asset.href)\n", | |
"ds = xr.open_zarr(store, **asset.extra_fields[\"xarray:open_kwargs\"])\n", | |
"ds" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "77de03f5-30c1-493c-88ca-9b9f3875382e", | |
"metadata": {}, | |
"source": [ | |
"Key points:\n", | |
"\n", | |
"* We only needed to know the name of the collection and the name of the asset to access the data\n", | |
"* STAC is very flexible. We're using the [datacube](https://github.com/stac-extensions/datacube) and [xarray-assets](https://github.com/stac-extensions/xarray-assets) extensions" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "41ea417c-4c65-409d-8839-c6fed0b8dfe5", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"ds[\"vap\"].isel(time=0).sel(lon=slice(-125.67, -121.06), lat=slice(48.69, 46.16)).plot();" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "2dd0374f-845b-4115-ba07-d8a24e5ea47a", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"(\n", | |
" ds[\"vap\"].sel(lon=slice(-125.67, -121.06),\n", | |
" lat=slice(48.69, 46.16))\n", | |
" .mean(dim=[\"lat\", \"lon\"])\n", | |
" .rolling(time=12).mean()\n", | |
" .plot()\n", | |
");" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "6ed01cbf-f284-4060-85cf-b62123a18aed", | |
"metadata": {}, | |
"source": [ | |
"## Tabular data\n", | |
"\n", | |
"Mostly stored in parquet. For example, https://planetarycomputer.microsoft.com/dataset/us-census." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "ed16f712-f60c-4a8d-ac01-d59fab5d682d", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"catalog = pystac_client.Client.open(\n", | |
" \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n", | |
")\n", | |
"search = catalog.search(collections=[\"us-census\"])\n", | |
"items = search.get_all_items()\n", | |
"items = {x.id: x for x in items}\n", | |
"list(items)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "8c2e66b1-cb8e-48f2-a791-90c058f81f72", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"geo_asset = planetary_computer.sign(\n", | |
" items[\"2020-census-blocks-geo\"].assets[\"data\"]\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "36bbe22f-5210-4fc2-b357-9465ed2b1f46", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import dask_geopandas\n", | |
"\n", | |
"geo = dask_geopandas.read_parquet(\n", | |
" geo_asset.href,\n", | |
" storage_options=geo_asset.extra_fields[\"table:storage_options\"],\n", | |
")\n", | |
"geo" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "5ae2f4e5-d49a-4b78-bdc9-cc47ef2bd65c", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import dask.dataframe\n", | |
"\n", | |
"pop_asset = planetary_computer.sign(\n", | |
" items[\"2020-census-blocks-population\"].assets[\"data\"]\n", | |
")\n", | |
"\n", | |
"pop = dask.dataframe.read_parquet(\n", | |
" pop_asset.href,\n", | |
" storage_options=pop_asset.extra_fields[\"table:storage_options\"],\n", | |
")\n", | |
"\n", | |
"df = geo.merge(pop)\n", | |
"df" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "b989a137-42fe-463f-9673-ab50e3a18419", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"start = min(x for x in geo.divisions if x.startswith(\"44\"))\n", | |
"stop = \"4499\"\n", | |
"ri = df.loc[start:stop].compute()\n", | |
"ri = ri[ri.P0010001 >= 10]\n", | |
"ri.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "279a7a18-97c7-47f8-823e-44eadb128433", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import numpy as np\n", | |
"import seaborn as sns\n", | |
"\n", | |
"sns.jointplot(x=ri.ALAND, y=np.log(ri.P0010001), marker=\".\");" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "56e4cc59-80f2-4c4b-b9ef-f89087a62b70", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"from folium.plugins import Fullscreen\n", | |
"\n", | |
"m = (\n", | |
" ri[[\"P0010001\", \"ALAND\", \"geometry\"]]\n", | |
" .loc[lambda df: df.P0010001 > 100]\n", | |
" .explore()\n", | |
")\n", | |
"\n", | |
"Fullscreen().add_to(m)\n", | |
"m" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "a26fb7b7-b057-4c06-be65-f38beeddcb7e", | |
"metadata": {}, | |
"source": [ | |
"## Summary\n", | |
"\n", | |
"* We're using STAC to catalog all of our data\n", | |
" - Remote sensing\n", | |
" - Earth Systems Science\n", | |
" - Tabular data\n", | |
"* STAC API for search\n", | |
"* Tools to convert from a collection of STAC items to a data structure (DataArray, DataFrame, ...)" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.12" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment