Skip to content

Instantly share code, notes, and snippets.

@KMarkert
Created August 27, 2024 20:11
Show Gist options
  • Save KMarkert/e127bf79096340fb52580bbe198030b5 to your computer and use it in GitHub Desktop.
Save KMarkert/e127bf79096340fb52580bbe198030b5 to your computer and use it in GitHub Desktop.
ee_stac_crawl.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"private_outputs": true,
"provenance": [],
"authorship_tag": "ABX9TyPTSEX5O0L3hyRKJo2/IxSu",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/KMarkert/e127bf79096340fb52580bbe198030b5/ee_stac_crawl.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"This notebook accesses STAC (SpatioTemporal Asset Catalog) catalog hosted by Google Earth Engine and filters for the datasets that have a non-commercial license"
],
"metadata": {
"id": "Dsyz-JPPb9UR"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "w9OAw1w_O1Cu"
},
"outputs": [],
"source": [
"!pip install pystac"
]
},
{
"cell_type": "code",
"source": [
"from pystac import Catalog\n",
"import warnings\n",
"\n",
"warnings.filterwarnings('ignore')"
],
"metadata": {
"id": "l8Vbc-K2O3Av"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# earth engine STAC catalog json\n",
"URL = 'https://earthengine-stac.storage.googleapis.com/catalog/catalog.json'"
],
"metadata": {
"id": "ve0TM4aGPDpR"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"root_catalog = Catalog.from_file(URL)"
],
"metadata": {
"id": "SvDoWm7bQZ0i"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# get a list of all of the collections in the catalog\n",
"# there are 1000+ collections so this does take a while\n",
"collections = list(root_catalog.get_all_collections())"
],
"metadata": {
"id": "RiuSlIVsQhug"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# filter the collections to those that have non-commercial licenses\n",
"nc_cols = list(\n",
" filter(lambda x:\n",
" (x.license == 'CC-BY-NC-4.0') |\n",
" (x.license == 'CC-BY-NC-SA-4.0'),\n",
" collections\n",
" )\n",
")"
],
"metadata": {
"id": "7n_oGCZyQlX9"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"# print the non-commercial dataset list\n",
"nc_cols"
],
"metadata": {
"id": "Cd8z415NRUMQ"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"The `nc_cols` list is a list of STAC collections so you can do other things with the elements if needed. See the [`pystac` documentation](https://pystac.readthedocs.io/en/stable/index.html) for more details on what you can do with the collections elements."
],
"metadata": {
"id": "Xoe0UbjNc9oa"
}
},
{
"cell_type": "code",
"source": [],
"metadata": {
"id": "l9u3tGa3R726"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment