TomAugspurger · October 25, 2021 17:47
diff --git a/data-access-with-stac.ipynb b/data-access-with-stac.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a588e7f9-5f3a-43b4-b31a-6171feef0690",
   "metadata": {},
   "source": [
    "# Data Access with STAC\n",
    "\n",
    "This notebook demonstrates using the Planetary Computer's STAC API to find and load data of interest."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "91efaeaf-14f7-47f9-8e42-101572480215",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dask.distributed import Client\n",
    "\n",
    "client = Client()\n",
    "client"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe2289ea-fd87-4242-bd81-9eec607a9917",
   "metadata": {},
   "source": [
    "## Remote sensing\n",
    "\n",
    "This demonstrates the motivating use-case for STAC. Given a massive collection of COGs, for for example, [Sentinel-2 Level-2A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a), find just the ones that match some particular conditions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d94a8e7-ba8d-47e1-99ce-01fbf3b95a1b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pystac_client\n",
    "\n",
    "catalog = pystac_client.Client.open(\n",
    "    \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n",
    ")\n",
    "\n",
    "search = catalog.search(\n",
    "    collections=[\"sentinel-2-l2a\"],\n",
    "    intersects={\n",
    "        \"type\": \"Polygon\",\n",
    "        \"coordinates\": [\n",
    "            [\n",
    "                [-148.56536865234375, 60.80072385643073],\n",
    "                [-147.44338989257812, 60.80072385643073],\n",
    "                [-147.44338989257812, 61.18363894915102],\n",
    "                [-148.56536865234375, 61.18363894915102],\n",
    "                [-148.56536865234375, 60.80072385643073],\n",
    "            ]\n",
    "        ],\n",
    "    },\n",
    "    datetime=\"2019-06-01/2019-08-01\",\n",
    "    query={\"eo:cloud_cover\": {\"lt\": 10}}\n",
    ")\n",
    "\n",
    "%time items = list(search.get_all_items())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e2eab338-3a6f-4d67-8664-d6e7548f783c",
   "metadata": {},
   "outputs": [],
   "source": [
    "len(items)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ce74b616-1411-4e8c-ba7c-dd2698e8b784",
   "metadata": {},
   "outputs": [],
   "source": [
    "import geopandas\n",
    "from folium.features import GeoJson\n",
    "\n",
    "gdf = geopandas.GeoDataFrame.from_features(\n",
    "    [item.to_dict() for item in items],\n",
    "    columns=[\"geometry\", \"datetime\", \"eo:cloud_cover\",],\n",
    ").set_crs(4326)\n",
    "gdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "de70fd08-f9c7-4fb7-8812-7902d25c7d98",
   "metadata": {},
   "outputs": [],
   "source": [
    "m = gdf.explore()\n",
    "\n",
    "layer = GeoJson(\n",
    "    data=search.get_parameters()[\"intersects\"],\n",
    "    style_function=lambda x: {\"color\": \"yellow\"}\n",
    ")\n",
    "layer.add_to(m)\n",
    "m"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee6ab434-768f-4ab7-a7a5-6f1126658c85",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pystac.extensions.eo import EOExtension as eo\n",
    "least_cloudy_item = min(items, key=lambda item: eo.ext(item).cloud_cover)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4f4618d-2114-4fc2-a615-39020abcf0a9",
   "metadata": {},
   "source": [
    "The Planetary Computer requires that you \"sign\" your items / assets before opening them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3c0411d-4741-43f3-b793-d483e524f800",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "r = requests.get(least_cloudy_item.assets[\"B02\"].href)\n",
    "r.status_code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "faf479a6-3e70-4e3d-92b1-3f6de31082f7",
   "metadata": {},
   "outputs": [],
   "source": [
    "import planetary_computer\n",
    "\n",
    "signed_item = planetary_computer.sign(least_cloudy_item)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3f5b230-330c-409f-960b-4077fbd2fbaf",
   "metadata": {},
   "source": [
    "Signing an item appends a short-lived, read-only SAS token to each asset's URL. This helps us keep a handle on what data is being used, and most importantly keep a handle on our egress costs.\n",
    "\n",
    "You can sign assets anonymously, so you don't need a Planetary Computer account to use the data (or STAC API). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4eb0ba13-8b5e-4712-a4f1-d0753673fa03",
   "metadata": {},
   "outputs": [],
   "source": [
    "import stackstac\n",
    "\n",
    "da = (\n",
    "    stackstac.stack(\n",
    "        signed_item.to_dict(),\n",
    "        assets=[\"B02\", \"B03\", \"B04\", \"B08\"]\n",
    "    )\n",
    "    .where(lambda x: x > 0)\n",
    "    .squeeze()\n",
    ")\n",
    "da"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c37e83d-c451-4e47-bbee-a7cc2c5b30ac",
   "metadata": {},
   "source": [
    "Now we can use xarray's nice high-level APIs for analysis (e.g. band math, to compute [NDVI](https://www.usgs.gov/core-science-systems/eros/phenology/science/ndvi-foundation-remote-sensing-phenology?qt-science_center_objects=0#))."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f8a94846-88bf-4834-ab03-eb19c5f95b9c",
   "metadata": {},
   "outputs": [],
   "source": [
    "red = da.sel(band=\"B04\")\n",
    "nir = da.sel(band=\"B08\")\n",
    "\n",
    "ndvi = (nir - red) / (nir + red)\n",
    "ndvi"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7068913d-4f22-4839-9373-484c393f0706",
   "metadata": {},
   "outputs": [],
   "source": [
    "ndvi = ndvi.persist()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "72d7e5b3-2387-4d4d-90ef-467d79d0b3bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ipyleaflet import FullScreenControl\n",
    "\n",
    "m = stackstac.show(ndvi, range=[-0.6, 0.6])\n",
    "m.scroll_wheel_zoom = True\n",
    "m.add_control(FullScreenControl())\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87cdb8ba-97d8-4152-bd4e-0eaee331b3bc",
   "metadata": {},
   "source": [
    "This approach is especially useful when building a datacube with multiple time-steps and multiple scenes in the `x` or `y` direction. `stacstack` knows how to arrange these items in space and time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d860f15c-cad7-401f-a688-0ddf8e04d824",
   "metadata": {},
   "outputs": [],
   "source": [
    "search = catalog.search(\n",
    "    bbox=[-122.20, 47.50, -121.99, 47.80],\n",
    "    datetime=\"2016-01-01/2020-12-31\",\n",
    "    collections=[\"sentinel-2-l2a\"],\n",
    "    limit=500,  # fetch items in batches of 500\n",
    "    query={\"eo:cloud_cover\": {\"lt\": 25}},\n",
    ")\n",
    "\n",
    "items = list(search.get_items())\n",
    "signed_items = [planetary_computer.sign(item).to_dict() for item in items]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "abcc3574-f7dd-48bb-bd60-436f708dd506",
   "metadata": {},
   "outputs": [],
   "source": [
    "len(items)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d52a6b7f-091a-4c51-9ad3-9a48b204555f",
   "metadata": {},
   "outputs": [],
   "source": [
    "data = (\n",
    "    stackstac.stack(\n",
    "        signed_items,\n",
    "        assets=[\"B08\", \"B04\", \"B03\", \"B02\"],\n",
    "        chunksize=4096,\n",
    "    )\n",
    "    .where(lambda x: x > 0)\n",
    "    .assign_coords(band=lambda x: x.common_name.rename(\"band\"))  # use common names\n",
    ")\n",
    "data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "050098bd-fa01-4ca3-b3fe-eee60e287dde",
   "metadata": {},
   "source": [
    "This is relying on the [`proj`](https://stac-extensions.github.io/) extension, which collects all the metadata necessary to know where in the DataArray each COG's data falls, without having to open any files.\n",
    "\n",
    "See also [odc-stac](https://odc-stac.readthedocs.io/en/latest/), which also uses STAC metadata to construct xarray DataArrays.\n",
    "\n",
    "## Earth systems data\n",
    "\n",
    "Now we'll work with some Earth Systems Science data using STAC. These datasets are typically higher-dimensional, and are stored as NetCDF or Zarr. We can still use STAC for data access.\n",
    "\n",
    "For example, terraclimate: https://planetarycomputer.microsoft.com/dataset/terraclimate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c648f4c4-d1af-403e-a6c1-9c57c6586465",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pystac\n",
    "\n",
    "url = \"https://planetarycomputer.microsoft.com/api/stac/v1/collections/terraclimate\"\n",
    "collection = pystac.read_file(url)\n",
    "collection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bbc09391-523a-4fed-9789-22aba6f448a8",
   "metadata": {},
   "outputs": [],
   "source": [
    "asset = collection.assets[\"zarr-https\"]\n",
    "asset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bded90b4-72b4-498d-984e-fc722e38e4e9",
   "metadata": {},
   "outputs": [],
   "source": [
    "asset.extra_fields"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "94b5d390-b6ea-45d4-b424-a092725b3aff",
   "metadata": {},
   "outputs": [],
   "source": [
    "import fsspec\n",
    "import xarray as xr\n",
    "\n",
    "store = fsspec.get_mapper(asset.href)\n",
    "ds = xr.open_zarr(store, **asset.extra_fields[\"xarray:open_kwargs\"])\n",
    "ds"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77de03f5-30c1-493c-88ca-9b9f3875382e",
   "metadata": {},
   "source": [
    "Key points:\n",
    "\n",
    "* We only needed to know the name of the collection and the name of the asset to access the data\n",
    "* STAC is very flexible. We're using the [datacube](https://github.com/stac-extensions/datacube) and [xarray-assets](https://github.com/stac-extensions/xarray-assets) extensions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "41ea417c-4c65-409d-8839-c6fed0b8dfe5",
   "metadata": {},
   "outputs": [],
   "source": [
    "ds[\"vap\"].isel(time=0).sel(lon=slice(-125.67, -121.06), lat=slice(48.69, 46.16)).plot();"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2dd0374f-845b-4115-ba07-d8a24e5ea47a",
   "metadata": {},
   "outputs": [],
   "source": [
    "(\n",
    "    ds[\"vap\"].sel(lon=slice(-125.67, -121.06),\n",
    "                  lat=slice(48.69, 46.16))\n",
    "    .mean(dim=[\"lat\", \"lon\"])\n",
    "    .rolling(time=12).mean()\n",
    "    .plot()\n",
    ");"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ed01cbf-f284-4060-85cf-b62123a18aed",
   "metadata": {},
   "source": [
    "## Tabular data\n",
    "\n",
    "Mostly stored in parquet. For example, https://planetarycomputer.microsoft.com/dataset/us-census."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed16f712-f60c-4a8d-ac01-d59fab5d682d",
   "metadata": {},
   "outputs": [],
   "source": [
    "catalog = pystac_client.Client.open(\n",
    "    \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n",
    ")\n",
    "search = catalog.search(collections=[\"us-census\"])\n",
    "items = search.get_all_items()\n",
    "items = {x.id: x for x in items}\n",
    "list(items)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c2e66b1-cb8e-48f2-a791-90c058f81f72",
   "metadata": {},
   "outputs": [],
   "source": [
    "geo_asset = planetary_computer.sign(\n",
    "    items[\"2020-census-blocks-geo\"].assets[\"data\"]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "36bbe22f-5210-4fc2-b357-9465ed2b1f46",
   "metadata": {},
   "outputs": [],
   "source": [
    "import dask_geopandas\n",
    "\n",
    "geo = dask_geopandas.read_parquet(\n",
    "    geo_asset.href,\n",
    "    storage_options=geo_asset.extra_fields[\"table:storage_options\"],\n",
    ")\n",
    "geo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5ae2f4e5-d49a-4b78-bdc9-cc47ef2bd65c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import dask.dataframe\n",
    "\n",
    "pop_asset = planetary_computer.sign(\n",
    "    items[\"2020-census-blocks-population\"].assets[\"data\"]\n",
    ")\n",
    "\n",
    "pop = dask.dataframe.read_parquet(\n",
    "    pop_asset.href,\n",
    "    storage_options=pop_asset.extra_fields[\"table:storage_options\"],\n",
    ")\n",
    "\n",
    "df = geo.merge(pop)\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b989a137-42fe-463f-9673-ab50e3a18419",
   "metadata": {},
   "outputs": [],
   "source": [
    "start = min(x for x in geo.divisions if x.startswith(\"44\"))\n",
    "stop = \"4499\"\n",
    "ri = df.loc[start:stop].compute()\n",
    "ri = ri[ri.P0010001 >= 10]\n",
    "ri.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "279a7a18-97c7-47f8-823e-44eadb128433",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import seaborn as sns\n",
    "\n",
    "sns.jointplot(x=ri.ALAND, y=np.log(ri.P0010001), marker=\".\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "56e4cc59-80f2-4c4b-b9ef-f89087a62b70",
   "metadata": {},
   "outputs": [],
   "source": [
    "from folium.plugins import Fullscreen\n",
    "\n",
    "m = (\n",
    "    ri[[\"P0010001\", \"ALAND\", \"geometry\"]]\n",
    "    .loc[lambda df: df.P0010001 > 100]\n",
    "    .explore()\n",
    ")\n",
    "\n",
    "Fullscreen().add_to(m)\n",
    "m"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a26fb7b7-b057-4c06-be65-f38beeddcb7e",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "* We're using STAC to catalog all of our data\n",
    "  - Remote sensing\n",
    "  - Earth Systems Science\n",
    "  - Tabular data\n",
    "* STAC API for search\n",
    "* Tools to convert from a collection of STAC items to a data structure (DataArray, DataFrame, ...)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "a588e7f9-5f3a-43b4-b31a-6171feef0690",
	"metadata": {},
	"source": [
	"# Data Access with STAC\n",
	"\n",
	"This notebook demonstrates using the Planetary Computer's STAC API to find and load data of interest."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "91efaeaf-14f7-47f9-8e42-101572480215",
	"metadata": {},
	"outputs": [],
	"source": [
	"from dask.distributed import Client\n",
	"\n",
	"client = Client()\n",
	"client"
	]
	},
	{
	"cell_type": "markdown",
	"id": "fe2289ea-fd87-4242-bd81-9eec607a9917",
	"metadata": {},
	"source": [
	"## Remote sensing\n",
	"\n",
	"This demonstrates the motivating use-case for STAC. Given a massive collection of COGs, for for example, [Sentinel-2 Level-2A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a), find just the ones that match some particular conditions."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "1d94a8e7-ba8d-47e1-99ce-01fbf3b95a1b",
	"metadata": {},
	"outputs": [],
	"source": [
	"import pystac_client\n",
	"\n",
	"catalog = pystac_client.Client.open(\n",
	" \"https://planetarycomputer.microsoft.com/api/stac/v1\"\n",
	")\n",
	"\n",
	"search = catalog.search(\n",
	" collections=[\"sentinel-2-l2a\"],\n",
	" intersects={\n",
	" \"type\": \"Polygon\",\n",
	" \"coordinates\": [\n",
	" [\n",
	" [-148.56536865234375, 60.80072385643073],\n",
	" [-147.44338989257812, 60.80072385643073],\n",
	" [-147.44338989257812, 61.18363894915102],\n",
	" [-148.56536865234375, 61.18363894915102],\n",
	" [-148.56536865234375, 60.80072385643073],\n",
	" ]\n",
	" ],\n",
	" },\n",
	" datetime=\"2019-06-01/2019-08-01\",\n",
	" query={\"eo:cloud_cover\": {\"lt\": 10}}\n",
	")\n",
	"\n",
	"%time items = list(search.get_all_items())"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "e2eab338-3a6f-4d67-8664-d6e7548f783c",
	"metadata": {},
	"outputs": [],
	"source": [
	"len(items)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "ce74b616-1411-4e8c-ba7c-dd2698e8b784",
	"metadata": {},
	"outputs": [],
	"source": [
	"import geopandas\n",
	"from folium.features import GeoJson\n",
	"\n",
	"gdf = geopandas.GeoDataFrame.from_features(\n",
	" [item.to_dict() for item in items],\n",
	" columns=[\"geometry\", \"datetime\", \"eo:cloud_cover\",],\n",
	").set_crs(4326)\n",
	"gdf"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "de70fd08-f9c7-4fb7-8812-7902d25c7d98",
	"metadata": {},
	"outputs": [],
	"source": [
	"m = gdf.explore()\n",
	"\n",
	"layer = GeoJson(\n",
	" data=search.get_parameters()[\"intersects\"],\n",
	" style_function=lambda x: {\"color\": \"yellow\"}\n",
	")\n",
	"layer.add_to(m)\n",
	"m"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "ee6ab434-768f-4ab7-a7a5-6f1126658c85",
	"metadata": {},
	"outputs": [],
	"source": [
	"from pystac.extensions.eo import EOExtension as eo\n",
	"least_cloudy_item = min(items, key=lambda item: eo.ext(item).cloud_cover)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "d4f4618d-2114-4fc2-a615-39020abcf0a9",
	"metadata": {},
	"source": [
	"The Planetary Computer requires that you \"sign\" your items / assets before opening them."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "e3c0411d-4741-43f3-b793-d483e524f800",
	"metadata": {},
	"outputs": [],
	"source": [
	"import requests\n",
	"\n",
	"r = requests.get(least_cloudy_item.assets[\"B02\"].href)\n",
	"r.status_code"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "faf479a6-3e70-4e3d-92b1-3f6de31082f7",
	"metadata": {},
	"outputs": [],
	"source": [
	"import planetary_computer\n",
	"\n",
	"signed_item = planetary_computer.sign(least_cloudy_item)"
	]
	},
	{
	"cell_type": "markdown",
	"id": "c3f5b230-330c-409f-960b-4077fbd2fbaf",
	"metadata": {},
	"source": [
	"Signing an item appends a short-lived, read-only SAS token to each asset's URL. This helps us keep a handle on what data is being used, and most importantly keep a handle on our egress costs.\n",
	"\n",
	"You can sign assets anonymously, so you don't need a Planetary Computer account to use the data (or STAC API). "
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "4eb0ba13-8b5e-4712-a4f1-d0753673fa03",
	"metadata": {},
	"outputs": [],
	"source": [
	"import stackstac\n",
	"\n",
	"da = (\n",
	" stackstac.stack(\n",
	" signed_item.to_dict(),\n",
	" assets=[\"B02\", \"B03\", \"B04\", \"B08\"]\n",
	" )\n",
	" .where(lambda x: x > 0)\n",
	" .squeeze()\n",
	")\n",
	"da"
	]
	},
	{
	"cell_type": "markdown",
	"id": "2c37e83d-c451-4e47-bbee-a7cc2c5b30ac",
	"metadata": {},
	"source": [
	"Now we can use xarray's nice high-level APIs for analysis (e.g. band math, to compute [NDVI](https://www.usgs.gov/core-science-systems/eros/phenology/science/ndvi-foundation-remote-sensing-phenology?qt-science_center_objects=0#))."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "f8a94846-88bf-4834-ab03-eb19c5f95b9c",
	"metadata": {},
	"outputs": [],
	"source": [
	"red = da.sel(band=\"B04\")\n",
	"nir = da.sel(band=\"B08\")\n",
	"\n",
	"ndvi = (nir - red) / (nir + red)\n",
	"ndvi"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "7068913d-4f22-4839-9373-484c393f0706",
	"metadata": {},
	"outputs": [],
	"source": [
	"ndvi = ndvi.persist()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "72d7e5b3-2387-4d4d-90ef-467d79d0b3bf",
	"metadata": {},
	"outputs": [],
	"source": [
	"from ipyleaflet import FullScreenControl\n",
	"\n",
	"m = stackstac.show(ndvi, range=[-0.6, 0.6])\n",
	"m.scroll_wheel_zoom = True\n",
	"m.add_control(FullScreenControl())\n",
	"m"
	]
	},
	{
	"cell_type": "markdown",
	"id": "87cdb8ba-97d8-4152-bd4e-0eaee331b3bc",
	"metadata": {},
	"source": [
	"This approach is especially useful when building a datacube with multiple time-steps and multiple scenes in the `x` or `y` direction. `stacstack` knows how to arrange these items in space and time."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "d860f15c-cad7-401f-a688-0ddf8e04d824",
	"metadata": {},
	"outputs": [],
	"source": [
	"search = catalog.search(\n",
	" bbox=[-122.20, 47.50, -121.99, 47.80],\n",
	" datetime=\"2016-01-01/2020-12-31\",\n",
	" collections=[\"sentinel-2-l2a\"],\n",
	" limit=500, # fetch items in batches of 500\n",
	" query={\"eo:cloud_cover\": {\"lt\": 25}},\n",
	")\n",
	"\n",
	"items = list(search.get_items())\n",
	"signed_items = [planetary_computer.sign(item).to_dict() for item in items]"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "abcc3574-f7dd-48bb-bd60-436f708dd506",
	"metadata": {},
	"outputs": [],
	"source": [
	"len(items)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "d52a6b7f-091a-4c51-9ad3-9a48b204555f",
	"metadata": {},
	"outputs": [],
	"source": [
	"data = (\n",
	" stackstac.stack(\n",
	" signed_items,\n",
	" assets=[\"B08\", \"B04\", \"B03\", \"B02\"],\n",
	" chunksize=4096,\n",
	" )\n",
	" .where(lambda x: x > 0)\n",
	" .assign_coords(band=lambda x: x.common_name.rename(\"band\")) # use common names\n",
	")\n",
	"data"
	]
	},
	{
	"cell_type": "markdown",
	"id": "050098bd-fa01-4ca3-b3fe-eee60e287dde",
	"metadata": {},
	"source": [
	"This is relying on the [`proj`](https://stac-extensions.github.io/) extension, which collects all the metadata necessary to know where in the DataArray each COG's data falls, without having to open any files.\n",
	"\n",
	"See also [odc-stac](https://odc-stac.readthedocs.io/en/latest/), which also uses STAC metadata to construct xarray DataArrays.\n",
	"\n",
	"## Earth systems data\n",
	"\n",
	"Now we'll work with some Earth Systems Science data using STAC. These datasets are typically higher-dimensional, and are stored as NetCDF or Zarr. We can still use STAC for data access.\n",
	"\n",
	"For example, terraclimate: https://planetarycomputer.microsoft.com/dataset/terraclimate"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "c648f4c4-d1af-403e-a6c1-9c57c6586465",
	"metadata": {},
	"outputs": [],
	"source": [
	"import pystac\n",
	"\n",
	"url = \"https://planetarycomputer.microsoft.com/api/stac/v1/collections/terraclimate\"\n",
	"collection = pystac.read_file(url)\n",
	"collection"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "bbc09391-523a-4fed-9789-22aba6f448a8",
	"metadata": {},
	"outputs": [],
	"source": [
	"asset = collection.assets[\"zarr-https\"]\n",
	"asset"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "bded90b4-72b4-498d-984e-fc722e38e4e9",
	"metadata": {},
	"outputs": [],
	"source": [
	"asset.extra_fields"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "94b5d390-b6ea-45d4-b424-a092725b3aff",
	"metadata": {},
	"outputs": [],
	"source": [
	"import fsspec\n",
	"import xarray as xr\n",
	"\n",
	"store = fsspec.get_mapper(asset.href)\n",
	"ds = xr.open_zarr(store, **asset.extra_fields[\"xarray:open_kwargs\"])\n",
	"ds"
	]
	},
	{
	"cell_type": "markdown",
	"id": "77de03f5-30c1-493c-88ca-9b9f3875382e",
	"metadata": {},
	"source": [
	"Key points:\n",
	"\n",
	"* We only needed to know the name of the collection and the name of the asset to access the data\n",
	"* STAC is very flexible. We're using the [datacube](https://github.com/stac-extensions/datacube) and [xarray-assets](https://github.com/stac-extensions/xarray-assets) extensions"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "41ea417c-4c65-409d-8839-c6fed0b8dfe5",
	"metadata": {},
	"outputs": [],
	"source": [
	"ds[\"vap\"].isel(time=0).sel(lon=slice(-125.67, -121.06), lat=slice(48.69, 46.16)).plot();"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "2dd0374f-845b-4115-ba07-d8a24e5ea47a",
	"metadata": {},
	"outputs": [],
	"source": [
	"(\n",
	" ds[\"vap\"].sel(lon=slice(-125.67, -121.06),\n",
	" lat=slice(48.69, 46.16))\n",
	" .mean(dim=[\"lat\", \"lon\"])\n",
	" .rolling(time=12).mean()\n",
	" .plot()\n",
	");"
	]
	},
	{
	"cell_type": "markdown",
	"id": "6ed01cbf-f284-4060-85cf-b62123a18aed",
	"metadata": {},
	"source": [
	"## Tabular data\n",
	"\n",
	"Mostly stored in parquet. For example, https://planetarycomputer.microsoft.com/dataset/us-census."
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "ed16f712-f60c-4a8d-ac01-d59fab5d682d",
	"metadata": {},
	"outputs": [],
	"source": [
	"catalog = pystac_client.Client.open(\n",
	" \"https://planetarycomputer.microsoft.com/api/stac/v1\",\n",
	")\n",
	"search = catalog.search(collections=[\"us-census\"])\n",
	"items = search.get_all_items()\n",
	"items = {x.id: x for x in items}\n",
	"list(items)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "8c2e66b1-cb8e-48f2-a791-90c058f81f72",
	"metadata": {},
	"outputs": [],
	"source": [
	"geo_asset = planetary_computer.sign(\n",
	" items[\"2020-census-blocks-geo\"].assets[\"data\"]\n",
	")"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "36bbe22f-5210-4fc2-b357-9465ed2b1f46",
	"metadata": {},
	"outputs": [],
	"source": [
	"import dask_geopandas\n",
	"\n",
	"geo = dask_geopandas.read_parquet(\n",
	" geo_asset.href,\n",
	" storage_options=geo_asset.extra_fields[\"table:storage_options\"],\n",
	")\n",
	"geo"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "5ae2f4e5-d49a-4b78-bdc9-cc47ef2bd65c",
	"metadata": {},
	"outputs": [],
	"source": [
	"import dask.dataframe\n",
	"\n",
	"pop_asset = planetary_computer.sign(\n",
	" items[\"2020-census-blocks-population\"].assets[\"data\"]\n",
	")\n",
	"\n",
	"pop = dask.dataframe.read_parquet(\n",
	" pop_asset.href,\n",
	" storage_options=pop_asset.extra_fields[\"table:storage_options\"],\n",
	")\n",
	"\n",
	"df = geo.merge(pop)\n",
	"df"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "b989a137-42fe-463f-9673-ab50e3a18419",
	"metadata": {},
	"outputs": [],
	"source": [
	"start = min(x for x in geo.divisions if x.startswith(\"44\"))\n",
	"stop = \"4499\"\n",
	"ri = df.loc[start:stop].compute()\n",
	"ri = ri[ri.P0010001 >= 10]\n",
	"ri.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "279a7a18-97c7-47f8-823e-44eadb128433",
	"metadata": {},
	"outputs": [],
	"source": [
	"import numpy as np\n",
	"import seaborn as sns\n",
	"\n",
	"sns.jointplot(x=ri.ALAND, y=np.log(ri.P0010001), marker=\".\");"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"id": "56e4cc59-80f2-4c4b-b9ef-f89087a62b70",
	"metadata": {},
	"outputs": [],
	"source": [
	"from folium.plugins import Fullscreen\n",
	"\n",
	"m = (\n",
	" ri[[\"P0010001\", \"ALAND\", \"geometry\"]]\n",
	" .loc[lambda df: df.P0010001 > 100]\n",
	" .explore()\n",
	")\n",
	"\n",
	"Fullscreen().add_to(m)\n",
	"m"
	]
	},
	{
	"cell_type": "markdown",
	"id": "a26fb7b7-b057-4c06-be65-f38beeddcb7e",
	"metadata": {},
	"source": [
	"## Summary\n",
	"\n",
	"* We're using STAC to catalog all of our data\n",
	" - Remote sensing\n",
	" - Earth Systems Science\n",
	" - Tabular data\n",
	"* STAC API for search\n",
	"* Tools to convert from a collection of STAC items to a data structure (DataArray, DataFrame, ...)"
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.12"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}