Skip to content

Instantly share code, notes, and snippets.

@TomAugspurger
Last active October 25, 2021 17:47
Show Gist options
  • Save TomAugspurger/bacdce8d9cf4ac209d899ea9e6845361 to your computer and use it in GitHub Desktop.
Save TomAugspurger/bacdce8d9cf4ac209d899ea9e6845361 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "a588e7f9-5f3a-43b4-b31a-6171feef0690",
"metadata": {},
"source": [
"# Data Access with STAC\n",
"\n",
"This notebook demonstrates using the Planetary Computer's STAC API to find and load data of interest."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "91efaeaf-14f7-47f9-8e42-101572480215",
"metadata": {},
"outputs": [],
"source": [
"from dask.distributed import Client\n",
"\n",
"client = Client()\n",
"client"
]
},
{
"cell_type": "markdown",
"id": "fe2289ea-fd87-4242-bd81-9eec607a9917",
"metadata": {},
"source": [
"## Remote sensing\n",
"\n",
"This demonstrates the motivating use-case for STAC. Given a massive collection of COGs, for for example, [Sentinel-2 Level-2A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a), find just the ones that match some particular conditions."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1d94a8e7-ba8d-47e1-99ce-01fbf3b95a1b",
"metadata": {},
"outputs": [],
"source": [
"import pystac_client\n",
"\n",
"catalog = pystac_client.Client.open(\n",
" \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n",
")\n",
"\n",
"search = catalog.search(\n",
" collections=[\"sentinel-2-l2a\"],\n",
" intersects={\n",
" \"type\": \"Polygon\",\n",
" \"coordinates\": [\n",
" [\n",
" [-148.56536865234375, 60.80072385643073],\n",
" [-147.44338989257812, 60.80072385643073],\n",
" [-147.44338989257812, 61.18363894915102],\n",
" [-148.56536865234375, 61.18363894915102],\n",
" [-148.56536865234375, 60.80072385643073],\n",
" ]\n",
" ],\n",
" },\n",
" datetime=\"2019-06-01/2019-08-01\",\n",
" query={\"eo:cloud_cover\": {\"lt\": 10}}\n",
")\n",
"\n",
"%time items = list(search.get_all_items())"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e2eab338-3a6f-4d67-8664-d6e7548f783c",
"metadata": {},
"outputs": [],
"source": [
"len(items)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ce74b616-1411-4e8c-ba7c-dd2698e8b784",
"metadata": {},
"outputs": [],
"source": [
"import geopandas\n",
"from folium.features import GeoJson\n",
"\n",
"gdf = geopandas.GeoDataFrame.from_features(\n",
" [item.to_dict() for item in items],\n",
" columns=[\"geometry\", \"datetime\", \"eo:cloud_cover\",],\n",
").set_crs(4326)\n",
"gdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "de70fd08-f9c7-4fb7-8812-7902d25c7d98",
"metadata": {},
"outputs": [],
"source": [
"m = gdf.explore()\n",
"\n",
"layer = GeoJson(\n",
" data=search.get_parameters()[\"intersects\"],\n",
" style_function=lambda x: {\"color\": \"yellow\"}\n",
")\n",
"layer.add_to(m)\n",
"m"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ee6ab434-768f-4ab7-a7a5-6f1126658c85",
"metadata": {},
"outputs": [],
"source": [
"from pystac.extensions.eo import EOExtension as eo\n",
"least_cloudy_item = min(items, key=lambda item: eo.ext(item).cloud_cover)"
]
},
{
"cell_type": "markdown",
"id": "d4f4618d-2114-4fc2-a615-39020abcf0a9",
"metadata": {},
"source": [
"The Planetary Computer requires that you \"sign\" your items / assets before opening them."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3c0411d-4741-43f3-b793-d483e524f800",
"metadata": {},
"outputs": [],
"source": [
"import requests\n",
"\n",
"r = requests.get(least_cloudy_item.assets[\"B02\"].href)\n",
"r.status_code"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "faf479a6-3e70-4e3d-92b1-3f6de31082f7",
"metadata": {},
"outputs": [],
"source": [
"import planetary_computer\n",
"\n",
"signed_item = planetary_computer.sign(least_cloudy_item)"
]
},
{
"cell_type": "markdown",
"id": "c3f5b230-330c-409f-960b-4077fbd2fbaf",
"metadata": {},
"source": [
"Signing an item appends a short-lived, read-only SAS token to each asset's URL. This helps us keep a handle on what data is being used, and most importantly keep a handle on our egress costs.\n",
"\n",
"You can sign assets anonymously, so you don't need a Planetary Computer account to use the data (or STAC API). "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4eb0ba13-8b5e-4712-a4f1-d0753673fa03",
"metadata": {},
"outputs": [],
"source": [
"import stackstac\n",
"\n",
"da = (\n",
" stackstac.stack(\n",
" signed_item.to_dict(),\n",
" assets=[\"B02\", \"B03\", \"B04\", \"B08\"]\n",
" )\n",
" .where(lambda x: x > 0)\n",
" .squeeze()\n",
")\n",
"da"
]
},
{
"cell_type": "markdown",
"id": "2c37e83d-c451-4e47-bbee-a7cc2c5b30ac",
"metadata": {},
"source": [
"Now we can use xarray's nice high-level APIs for analysis (e.g. band math, to compute [NDVI](https://www.usgs.gov/core-science-systems/eros/phenology/science/ndvi-foundation-remote-sensing-phenology?qt-science_center_objects=0#))."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f8a94846-88bf-4834-ab03-eb19c5f95b9c",
"metadata": {},
"outputs": [],
"source": [
"red = da.sel(band=\"B04\")\n",
"nir = da.sel(band=\"B08\")\n",
"\n",
"ndvi = (nir - red) / (nir + red)\n",
"ndvi"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7068913d-4f22-4839-9373-484c393f0706",
"metadata": {},
"outputs": [],
"source": [
"ndvi = ndvi.persist()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "72d7e5b3-2387-4d4d-90ef-467d79d0b3bf",
"metadata": {},
"outputs": [],
"source": [
"from ipyleaflet import FullScreenControl\n",
"\n",
"m = stackstac.show(ndvi, range=[-0.6, 0.6])\n",
"m.scroll_wheel_zoom = True\n",
"m.add_control(FullScreenControl())\n",
"m"
]
},
{
"cell_type": "markdown",
"id": "87cdb8ba-97d8-4152-bd4e-0eaee331b3bc",
"metadata": {},
"source": [
"This approach is especially useful when building a datacube with multiple time-steps and multiple scenes in the `x` or `y` direction. `stacstack` knows how to arrange these items in space and time."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d860f15c-cad7-401f-a688-0ddf8e04d824",
"metadata": {},
"outputs": [],
"source": [
"search = catalog.search(\n",
" bbox=[-122.20, 47.50, -121.99, 47.80],\n",
" datetime=\"2016-01-01/2020-12-31\",\n",
" collections=[\"sentinel-2-l2a\"],\n",
" limit=500, # fetch items in batches of 500\n",
" query={\"eo:cloud_cover\": {\"lt\": 25}},\n",
")\n",
"\n",
"items = list(search.get_items())\n",
"signed_items = [planetary_computer.sign(item).to_dict() for item in items]"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "abcc3574-f7dd-48bb-bd60-436f708dd506",
"metadata": {},
"outputs": [],
"source": [
"len(items)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d52a6b7f-091a-4c51-9ad3-9a48b204555f",
"metadata": {},
"outputs": [],
"source": [
"data = (\n",
" stackstac.stack(\n",
" signed_items,\n",
" assets=[\"B08\", \"B04\", \"B03\", \"B02\"],\n",
" chunksize=4096,\n",
" )\n",
" .where(lambda x: x > 0)\n",
" .assign_coords(band=lambda x: x.common_name.rename(\"band\")) # use common names\n",
")\n",
"data"
]
},
{
"cell_type": "markdown",
"id": "050098bd-fa01-4ca3-b3fe-eee60e287dde",
"metadata": {},
"source": [
"This is relying on the [`proj`](https://stac-extensions.github.io/) extension, which collects all the metadata necessary to know where in the DataArray each COG's data falls, without having to open any files.\n",
"\n",
"See also [odc-stac](https://odc-stac.readthedocs.io/en/latest/), which also uses STAC metadata to construct xarray DataArrays.\n",
"\n",
"## Earth systems data\n",
"\n",
"Now we'll work with some Earth Systems Science data using STAC. These datasets are typically higher-dimensional, and are stored as NetCDF or Zarr. We can still use STAC for data access.\n",
"\n",
"For example, terraclimate: https://planetarycomputer.microsoft.com/dataset/terraclimate"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c648f4c4-d1af-403e-a6c1-9c57c6586465",
"metadata": {},
"outputs": [],
"source": [
"import pystac\n",
"\n",
"url = \"https://planetarycomputer.microsoft.com/api/stac/v1/collections/terraclimate\"\n",
"collection = pystac.read_file(url)\n",
"collection"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bbc09391-523a-4fed-9789-22aba6f448a8",
"metadata": {},
"outputs": [],
"source": [
"asset = collection.assets[\"zarr-https\"]\n",
"asset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bded90b4-72b4-498d-984e-fc722e38e4e9",
"metadata": {},
"outputs": [],
"source": [
"asset.extra_fields"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "94b5d390-b6ea-45d4-b424-a092725b3aff",
"metadata": {},
"outputs": [],
"source": [
"import fsspec\n",
"import xarray as xr\n",
"\n",
"store = fsspec.get_mapper(asset.href)\n",
"ds = xr.open_zarr(store, **asset.extra_fields[\"xarray:open_kwargs\"])\n",
"ds"
]
},
{
"cell_type": "markdown",
"id": "77de03f5-30c1-493c-88ca-9b9f3875382e",
"metadata": {},
"source": [
"Key points:\n",
"\n",
"* We only needed to know the name of the collection and the name of the asset to access the data\n",
"* STAC is very flexible. We're using the [datacube](https://github.com/stac-extensions/datacube) and [xarray-assets](https://github.com/stac-extensions/xarray-assets) extensions"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "41ea417c-4c65-409d-8839-c6fed0b8dfe5",
"metadata": {},
"outputs": [],
"source": [
"ds[\"vap\"].isel(time=0).sel(lon=slice(-125.67, -121.06), lat=slice(48.69, 46.16)).plot();"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2dd0374f-845b-4115-ba07-d8a24e5ea47a",
"metadata": {},
"outputs": [],
"source": [
"(\n",
" ds[\"vap\"].sel(lon=slice(-125.67, -121.06),\n",
" lat=slice(48.69, 46.16))\n",
" .mean(dim=[\"lat\", \"lon\"])\n",
" .rolling(time=12).mean()\n",
" .plot()\n",
");"
]
},
{
"cell_type": "markdown",
"id": "6ed01cbf-f284-4060-85cf-b62123a18aed",
"metadata": {},
"source": [
"## Tabular data\n",
"\n",
"Mostly stored in parquet. For example, https://planetarycomputer.microsoft.com/dataset/us-census."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ed16f712-f60c-4a8d-ac01-d59fab5d682d",
"metadata": {},
"outputs": [],
"source": [
"catalog = pystac_client.Client.open(\n",
" \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n",
")\n",
"search = catalog.search(collections=[\"us-census\"])\n",
"items = search.get_all_items()\n",
"items = {x.id: x for x in items}\n",
"list(items)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8c2e66b1-cb8e-48f2-a791-90c058f81f72",
"metadata": {},
"outputs": [],
"source": [
"geo_asset = planetary_computer.sign(\n",
" items[\"2020-census-blocks-geo\"].assets[\"data\"]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "36bbe22f-5210-4fc2-b357-9465ed2b1f46",
"metadata": {},
"outputs": [],
"source": [
"import dask_geopandas\n",
"\n",
"geo = dask_geopandas.read_parquet(\n",
" geo_asset.href,\n",
" storage_options=geo_asset.extra_fields[\"table:storage_options\"],\n",
")\n",
"geo"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ae2f4e5-d49a-4b78-bdc9-cc47ef2bd65c",
"metadata": {},
"outputs": [],
"source": [
"import dask.dataframe\n",
"\n",
"pop_asset = planetary_computer.sign(\n",
" items[\"2020-census-blocks-population\"].assets[\"data\"]\n",
")\n",
"\n",
"pop = dask.dataframe.read_parquet(\n",
" pop_asset.href,\n",
" storage_options=pop_asset.extra_fields[\"table:storage_options\"],\n",
")\n",
"\n",
"df = geo.merge(pop)\n",
"df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b989a137-42fe-463f-9673-ab50e3a18419",
"metadata": {},
"outputs": [],
"source": [
"start = min(x for x in geo.divisions if x.startswith(\"44\"))\n",
"stop = \"4499\"\n",
"ri = df.loc[start:stop].compute()\n",
"ri = ri[ri.P0010001 >= 10]\n",
"ri.head()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "279a7a18-97c7-47f8-823e-44eadb128433",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import seaborn as sns\n",
"\n",
"sns.jointplot(x=ri.ALAND, y=np.log(ri.P0010001), marker=\".\");"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "56e4cc59-80f2-4c4b-b9ef-f89087a62b70",
"metadata": {},
"outputs": [],
"source": [
"from folium.plugins import Fullscreen\n",
"\n",
"m = (\n",
" ri[[\"P0010001\", \"ALAND\", \"geometry\"]]\n",
" .loc[lambda df: df.P0010001 > 100]\n",
" .explore()\n",
")\n",
"\n",
"Fullscreen().add_to(m)\n",
"m"
]
},
{
"cell_type": "markdown",
"id": "a26fb7b7-b057-4c06-be65-f38beeddcb7e",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"* We're using STAC to catalog all of our data\n",
" - Remote sensing\n",
" - Earth Systems Science\n",
" - Tabular data\n",
"* STAC API for search\n",
"* Tools to convert from a collection of STAC items to a data structure (DataArray, DataFrame, ...)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment