Skip to content

Instantly share code, notes, and snippets.

@mdsumner
Last active August 1, 2017 00:17
Show Gist options
  • Save mdsumner/877fbc49de7901b9b950a0e1e9baa01d to your computer and use it in GitHub Desktop.
Save mdsumner/877fbc49de7901b9b950a0e1e9baa01d to your computer and use it in GitHub Desktop.
Psuedo code summary of realistic NetCDF extraction for polygons
## 80 polygons collectively covering 18k cells in a [681,841] grid
shp_spatial <- as(shp_tab, "Spatial")
## variable has 20k layers (modelled climate metric)
## 681, 841, 20587
filename <- "/path/to/hideous/monstrosity40Gb.nc"
## it's a regular grid, in longlat/WGS84
tbrick <- raster::brick(filename, quick = TRUE, varname = "monster1")
## build cell / polygon mapping
## devtools::install_github("hypertidy/tabularaster")
## we totally rely on regular grid here, raster::cellFrom* is affine-only
cell <- tabularaster::cellnumbers(tbrick, shp_spatial)
## guesstimate at 3 hours for traditional loop over stack layers with cell number extract
#for (i in seq_len(nlayers(tbrick))) x[[i]] <- raster::extract(tbrick[[i]], cell$cell_)
library(future)
plan(multiprocess)
## function to be applied in parallel (all managed by future-plan)
fun1 <- function(ilayer) {
## main efficiency if pre-calc cell numbers, with grouping-ID for polygon object
raster::extract(tbrick[[ilayer]], cell$cell_)
}
system.time({
x <- future_lapply(seq_len(nlayers(tbrick)), fun1)
})
## 12 cores
#user system elapsed
#11839.320 808.560 1220.962
## YMMV - much depends on the grid dimensions, and its layout on disk - single-cell all-time
## extraction is prohibitively slow in this particular case, see ?raster::writeRaster for its
## terminology here in raw files (BSQ, BIL, BIP) - package ff handles these with ease, but
## converting 40Gb of climate model output is rarely a practical workflow to otimize extraction
## - i.e. the extraction scheme really matters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment