Skip to content

Instantly share code, notes, and snippets.

@etiennebr
Last active July 23, 2024 23:44
Show Gist options
  • Save etiennebr/9515738 to your computer and use it in GitHub Desktop.
Save etiennebr/9515738 to your computer and use it in GitHub Desktop.
Transform raster or terra object to data.table
#' Transform raster to data.table
#'
#' @param x Raster* object
#' @param row.names `NULL` or a character vector giving the row names for the data frame. Missing values are not allowed
#' @param optional logical. If `TRUE`, setting row names and converting column names (to syntactic names: see make.names) is optional
#' @param xy logical. If `TRUE`, also return the spatial coordinates
#' @param centroids logical. If TRUE return the centroids instead of all spatial coordinates (only relevant if xy=TRUE)
#' @param sepNA logical. If TRUE the parts of the spatial objects are separated by lines that are NA (only if xy=TRUE and, for polygons, if centroids=FALSE
#' @param ... Additional arguments (none) passed to `raster::as.data.frame`
#'
#' @value returns a data.table object
#' @examples
#' logo <- brick(system.file("external/rlogo.grd", package="raster"))
#' v <- as.data.table(logo)
#' @import
as.data.table.raster <- function(x, row.names = NULL, optional = FALSE, xy=FALSE, inmem = canProcessInMemory(x, 2), ...) {
stopifnot(require("data.table"))
if(inmem) {
v <- as.data.table(as.data.frame(x, row.names=row.names, optional=optional, xy=xy, ...))
} else {
tr <- blockSize(x, n=2)
l <- lapply(1:tr$n, function(i)
as.data.table(as.data.frame(getValues(x,
row=tr$row[i],
nrows=tr$nrows[i]),
row.names=row.names, optional=optional, xy=xy, ...)))
v <- rbindlist(l)
}
coln <- names(x)
if(xy) coln <- c("x", "y", coln)
setnames(v, coln)
v
}
#' @param xy logical. If TRUE, the coordinates of each raster cell are included
#' @param cells logical. If TRUE, the cell numbers of each raster cell are included
#' @param na.rm logical. If TRUE, cells that have a NA value in at least one layer are removed
#' @param ... Additional arguments (none) passed to `terra::as.data.frame`
#' @value returns a data.table object
#' @examples
#' r <- rast(ncols=2, nrows=2)
#' values(r) <- 1:ncell(r)
#' as.data.table(r, xy = TRUE)
#' @importFrom terra as.data.frame
#' @importFrom data.table as.data.table
as.data.table.SpatRaster <- function(x, optional = FALSE, xy = FALSE, ...) {
stopifnot(require("data.table"))
v <- as.data.table(as.data.frame(x, optional = optional, xy = xy, ...))
coln <- names(x)
if(xy) coln <- c("x", "y", coln)
setnames(v, coln)
v
}
if (!isGeneric("as.data.table")) {
setGeneric("as.data.table", function(x, ...)
standardGeneric("as.data.table"))
}
setMethod('as.data.table', signature(x='data.frame'), data.table::as.data.table)
# make sure you have terra or raster loaded (as needed) before
setMethod('as.data.table', signature(x='Raster'), as.data.table.raster)
setMethod('as.data.table', signature(x='SpatRaster'), as.data.table.SpatRaster)
@etiennebr
Copy link
Author

I haven't used it in a while, so I can't say. If you can provide a reproducible example I could have a look.

@ptompalski
Copy link

@etiennebr fantastic tool!
Question - do you know if it is possible to adapt it to work with terra?
The terra package has terra::as.data.frame() function similar to raster::as.data.frame(), but the parameters are different. Perhaps you have looked into this already?

@etiennebr
Copy link
Author

Thanks @ptompalski! I haven't worked on this, but you're right it would be useful.

@thiagoveloso
Copy link

Maybe it's useful to take a look at Robert's feedback to an old request of mine here: rspatial/raster#55

@etiennebr
Copy link
Author

Thanks @thiagoveloso, @ptompalski I updated the gist. Let me know if it works for you!

@mikeshewring
Copy link

Updated to terra

as.data.table.raster <- function(x, row.names = NULL, optional = FALSE, xy=FALSE, inmem = terra::inMemory(x), ...) {
stopifnot(require("data.table"))
if(inmem) {
v <- as.data.table(as.data.frame(x, row.names=row.names, optional=optional, xy=xy, ...))
coln <- names(x)
if(xy) coln <- c("x", "y", coln)
setnames(v, coln)
} else {
tr <- blocks(x)
l <- lapply(1:tr$n, function(i) {
DT <- as.data.table(as.data.frame(terra::values(x, row = tr$row[i], nrows = tr$nrows[i]), ...))
if(xy == TRUE) {
cells <- terra::cellFromRowCol(x, c(tr$row[i], tr$row[i] + tr$nrows[i] - 1), c(1, ncol(x)))
coords <- terra::xyFromCell(x, cell = cells[1]:cells[2])
DT[, c("x", "y") := data.frame(terra::xyFromCell(x, cell = cells[1]:cells[2]))]
}
DT
})
v <- rbindlist(l)
coln <- names(x)
if(xy) {
coln <- c("x", "y", coln)
setcolorder(v, coln)
}
}
v
}

@etiennebr
Copy link
Author

Hi Mike, thanks for sharing! The current gist works for me with terra.

# after sourcing the gist
terra::rast(matrix(1, 2, 2)) %>% as.data.table(xy = TRUE)
#> x   y lyr.1
#> 1: 0.5 1.5     1
#> 2: 1.5 1.5     1
#> 3: 0.5 0.5     1
#> 4: 1.5 0.5     1

But maybe there are some edge cases that are not supported. Let me know if there anything missing from the actual gist that didn't work for you on terra.

@tylerhoecker
Copy link

tylerhoecker commented Nov 8, 2023

Thanks @etiennebr for this super useful function! I used it successfully with terra. I did encounter an error when applying this function over a list of SpatRasters using purrr::map, but not when using lapply. A non-repro example:

# WORKS
dt_list <- lapply(spatrast_list, FUN = as.data.table)

# ERROR
dt_list <- spatrast_list %>% 
  map(., as.data.table())

# Error message: Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘as.data.table’ for signature ‘"missing"’

@etiennebr
Copy link
Author

Thanks @tylerhoecker, it seems that the as.data.table function should be called the same way than for lapply. Could you try:

dt_list <- spatrast_list %>% 
  map(as.data.table)
# or
dt_list <- map(spatrast_list, as.data.table)

@stephstewart02
Copy link

Hi @etiennebr, I am a bit confused about why the main body of code doesn't incorporate a solution to the issue Idemaz identified. Is there a particular reason? I am trying to work with a package that relied on the main code, understandably because they probably thought that it functioned correctly. It is causing issues because I am working with a large amount of raster data, and want the x and y columns. Is the solution provided in the comments the best solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment