-
-
Save etiennebr/9515738 to your computer and use it in GitHub Desktop.
#' Transform raster to data.table | |
#' | |
#' @param x Raster* object | |
#' @param row.names `NULL` or a character vector giving the row names for the data frame. Missing values are not allowed | |
#' @param optional logical. If `TRUE`, setting row names and converting column names (to syntactic names: see make.names) is optional | |
#' @param xy logical. If `TRUE`, also return the spatial coordinates | |
#' @param centroids logical. If TRUE return the centroids instead of all spatial coordinates (only relevant if xy=TRUE) | |
#' @param sepNA logical. If TRUE the parts of the spatial objects are separated by lines that are NA (only if xy=TRUE and, for polygons, if centroids=FALSE | |
#' @param ... Additional arguments (none) passed to `raster::as.data.frame` | |
#' | |
#' @value returns a data.table object | |
#' @examples | |
#' logo <- brick(system.file("external/rlogo.grd", package="raster")) | |
#' v <- as.data.table(logo) | |
#' @import | |
as.data.table.raster <- function(x, row.names = NULL, optional = FALSE, xy=FALSE, inmem = canProcessInMemory(x, 2), ...) { | |
stopifnot(require("data.table")) | |
if(inmem) { | |
v <- as.data.table(as.data.frame(x, row.names=row.names, optional=optional, xy=xy, ...)) | |
} else { | |
tr <- blockSize(x, n=2) | |
l <- lapply(1:tr$n, function(i) | |
as.data.table(as.data.frame(getValues(x, | |
row=tr$row[i], | |
nrows=tr$nrows[i]), | |
row.names=row.names, optional=optional, xy=xy, ...))) | |
v <- rbindlist(l) | |
} | |
coln <- names(x) | |
if(xy) coln <- c("x", "y", coln) | |
setnames(v, coln) | |
v | |
} | |
#' @param xy logical. If TRUE, the coordinates of each raster cell are included | |
#' @param cells logical. If TRUE, the cell numbers of each raster cell are included | |
#' @param na.rm logical. If TRUE, cells that have a NA value in at least one layer are removed | |
#' @param ... Additional arguments (none) passed to `terra::as.data.frame` | |
#' @value returns a data.table object | |
#' @examples | |
#' r <- rast(ncols=2, nrows=2) | |
#' values(r) <- 1:ncell(r) | |
#' as.data.table(r, xy = TRUE) | |
#' @importFrom terra as.data.frame | |
#' @importFrom data.table as.data.table | |
as.data.table.SpatRaster <- function(x, optional = FALSE, xy = FALSE, ...) { | |
stopifnot(require("data.table")) | |
v <- as.data.table(as.data.frame(x, optional = optional, xy = xy, ...)) | |
coln <- names(x) | |
if(xy) coln <- c("x", "y", coln) | |
setnames(v, coln) | |
v | |
} | |
if (!isGeneric("as.data.table")) { | |
setGeneric("as.data.table", function(x, ...) | |
standardGeneric("as.data.table")) | |
} | |
setMethod('as.data.table', signature(x='data.frame'), data.table::as.data.table) | |
# make sure you have terra or raster loaded (as needed) before | |
setMethod('as.data.table', signature(x='Raster'), as.data.table.raster) | |
setMethod('as.data.table', signature(x='SpatRaster'), as.data.table.SpatRaster) |
Etienne, just dropping by to thank you for this great function!
Awesome function! I wonder why is it NOT shipped in the raster package?
Recently came back to this function for an old workflow and keep running into memory allocation errors. Anyone else experience this? It is with large, but sparse rasters, lots of NA values over a large extent.
I haven't used it in a while, so I can't say. If you can provide a reproducible example I could have a look.
@etiennebr fantastic tool!
Question - do you know if it is possible to adapt it to work with terra?
The terra package has terra::as.data.frame()
function similar to raster::as.data.frame()
, but the parameters are different. Perhaps you have looked into this already?
Thanks @ptompalski! I haven't worked on this, but you're right it would be useful.
Maybe it's useful to take a look at Robert's feedback to an old request of mine here: rspatial/raster#55
Thanks @thiagoveloso, @ptompalski I updated the gist. Let me know if it works for you!
Updated to terra
as.data.table.raster <- function(x, row.names = NULL, optional = FALSE, xy=FALSE, inmem = terra::inMemory(x), ...) {
stopifnot(require("data.table"))
if(inmem) {
v <- as.data.table(as.data.frame(x, row.names=row.names, optional=optional, xy=xy, ...))
coln <- names(x)
if(xy) coln <- c("x", "y", coln)
setnames(v, coln)
} else {
tr <- blocks(x)
l <- lapply(1:tr$n, function(i) {
DT <- as.data.table(as.data.frame(terra::values(x, row = tr$row[i], nrows = tr$nrows[i]), ...))
if(xy == TRUE) {
cells <- terra::cellFromRowCol(x, c(tr$row[i], tr$row[i] + tr$nrows[i] - 1), c(1, ncol(x)))
coords <- terra::xyFromCell(x, cell = cells[1]:cells[2])
DT[, c("x", "y") := data.frame(terra::xyFromCell(x, cell = cells[1]:cells[2]))]
}
DT
})
v <- rbindlist(l)
coln <- names(x)
if(xy) {
coln <- c("x", "y", coln)
setcolorder(v, coln)
}
}
v
}
Hi Mike, thanks for sharing! The current gist works for me with terra
.
# after sourcing the gist
terra::rast(matrix(1, 2, 2)) %>% as.data.table(xy = TRUE)
#> x y lyr.1
#> 1: 0.5 1.5 1
#> 2: 1.5 1.5 1
#> 3: 0.5 0.5 1
#> 4: 1.5 0.5 1
But maybe there are some edge cases that are not supported. Let me know if there anything missing from the actual gist that didn't work for you on terra
.
Thanks @etiennebr for this super useful function! I used it successfully with terra. I did encounter an error when applying this function over a list of SpatRasters using purrr::map
, but not when using lapply
. A non-repro example:
# WORKS
dt_list <- lapply(spatrast_list, FUN = as.data.table)
# ERROR
dt_list <- spatrast_list %>%
map(., as.data.table())
# Error message: Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘as.data.table’ for signature ‘"missing"’
Thanks @tylerhoecker, it seems that the as.data.table
function should be called the same way than for lapply
. Could you try:
dt_list <- spatrast_list %>%
map(as.data.table)
# or
dt_list <- map(spatrast_list, as.data.table)
Hi @etiennebr, I am a bit confused about why the main body of code doesn't incorporate a solution to the issue Idemaz identified. Is there a particular reason? I am trying to work with a package that relied on the main code, understandably because they probably thought that it functioned correctly. It is causing issues because I am working with a large amount of raster data, and want the x and y columns. Is the solution provided in the comments the best solution?
Great function! This reduces the processing time for counting the number of cells in my raster with a particular value from 175 seconds using
raster::freq()
down to 1.7 seconds!