Skip to content

Instantly share code, notes, and snippets.

View rbavery's full-sized avatar

Ryan Avery rbavery

View GitHub Profile
@ljstrnadiii
ljstrnadiii / reproject_resample.py
Last active November 2, 2024 06:08
Resample/Reproject Cogs w/ Dask + Rioxarray
import logging
import subprocess as sp
import tempfile
import threading
import geopandas as gpd
import numpy as np
import rioxarray
import xarray as xr
from rasterio.enums import Resampling

Holy grail

Before diving too deeply into the various friction points when working with archives of earth observation data in xarray, let's look at a more optimal case from the earth systems world. In the notebook here we demonstrate how using zarr's consolidated metadata option to access the dimensional and chunk reference information, a massive dataset's dimensions and variables can be loaded extremely quickly. With this consolidated metadata available to reference chunks on disk, we can leverage xarray's dask integration to use normal xarray operations to lazily load chunks in parallel and perform our calculations using dask's blocked algorithm implementations. Gravy.

Challenges

But the earth observation story is more complicated... Not everything lives in standardized file containers and more importantly our grid coordinate systems are "all over the map" :] Here are some of the current challenges.

  1. Consolida
@KMarkert
KMarkert / gedi_to_csv.py
Created February 12, 2020 22:37
Python script to take GEDI level 2 data and convert variables to a CSV file. Usage `python gedi_to_csv.py <path> --variables [<var1>,<var2>,<var3>] --verbose`
import os
import fire
import h5py
import glob
import tqdm
import numpy as np
import pandas as pd
# requires h5py, tqdm, fire, numpy, and pandas to run