Error std::bad_alloc using `terra::extract` on large stack and many points

62 Views Asked by At

I'm trying to run a bilinear terra::extract of > 2 700 000 point coordinates on a SpatRaster stack with many layers (2880, although each with 100 cells only) and getting "Error: std::bad_alloc", probably because I run out of memory.

I've I thought about processing this sequentially using a lapply on a list version of my stack (something like as.list(stk)) but it will take ages to process, I think (based on a benchmarking exercise with the first two layers), so I was hoping for another solution.

I can probably split the SpatRaster into sections, but I thought terra would do that automagically, but processing by chunks

Here's a reprex:

library(terra)
library(data.table)

r <- rast(nrow = 10, ncol = 11, ext = c(-130.3125, -109.6875, 45.625, 58.125), resolution = c(1.875, 1.25),  crs = "EPSG:4326")
r[] <- runif(n = ncell(r))
stk <- eval(parse(text = paste0("rast(list(", paste(rep("r", 2880), collapse = ", "), "))")))

xyz <- data.table(x = sample(seq(-130.3125, -109.6875, by = 0.0001), size = 2785518, replace = TRUE),
                  y = sample(seq(45.625, 58.1255, by = 0.00001), size = 2785518, replace = TRUE))

tester <- extract(stk, xyz, method = "bilinear") 
# Error: std::bad_alloc

I've trying setting terraOptions(memmax = 0.1, memmin = 0.1) to see if that increased the number of processing chunks (or triggered it), but it didn't and I get the same error.

> mem_info(stk)

------------------------
Memory (GB) 
------------------------
check threshold : 1 (memmin)
available       : 23.98
allowed (60%)   : 14.39
needed (n=1)    : 0
------------------------
proc in memory  : TRUE
nr chunks       : 1
------------------------
> terraOptions(memmax = 0.1, memmin = 0.1)
> mem_info(stk)

------------------------
Memory (GB) 
------------------------
check threshold : 0.1 (memmin)
available       : 0.1 (memmax)
allowed (60%)   : 0.06
needed (n=1)    : 0
------------------------
proc in memory  : TRUE
nr chunks       : 1
------------------------

This issue is potentially related to operation on a very large raster in terra causes std::bad_alloc but I couldn't find a solution there

UPDATE: Following @robert-hijmans 's second suggestion here's a solution that uses extract(..., bilinear = TRUE) and avoids extracting the same cell value more than once:

library(terra)
library(data.table)

r <- rast(nrow = 10, ncol = 11, ext = c(-130.3125, -109.6875, 45.625, 58.125), resolution = c(1.875, 1.25),  crs = "EPSG:4326")
r[] <- runif(n = ncell(r))
stk <- eval(parse(text = paste0("rast(list(", paste(rep("r", 2880), collapse = ", "), "))")))

xyz <- data.table(x = sample(seq(-130.3125, -109.6875, by = 0.0001), size = 2785518, replace = TRUE),
                  y = sample(seq(45.625, 58.1255, by = 0.00001), size = 2785518, replace = TRUE))

## get raster cell coordinates for the selected points:
point2CellCoords <- data.table(cell = cellFromXY(stk, xyz), xyz)

## now make a vector of unique cell coordinates
uniqueCoords <- data.table(cell = point2CellCoords$cell, xyFromCell(stk, point2CellCoords$cell)) |>
  na.omit() |>
  unique()

## extract for unique locations and bind back to all points.
tester <- extract(stk, uniqueCoords[, .(x, y)], method = "bilinear") |>
  as.data.table()
tester$cell <- uniqueCoords$cell

result <- tester[point2CellCoords, on = "cell"]
1

There are 1 best solutions below

3
Robert Hijmans On BEST ANSWER

You can do the extraction in chunks. That is, do not split the rasters, but rather do the extraction for subsets of the points (rows of xyz).

For example

i <- ceiling(1:nrow(xyz) / 500000)
u <- unique(i)
tester <- lapply(u, extract(stk, xyz[u==i, ], method = "bilinear") )
tester <- do.call(rbind, tester)

Since you have 2,700,000 points and only 100 cells you could do the below to avoid memory limitations. With raster r and points p you could do

cells <- cellFromXY(r, p)
v <- values(r)
result <- v[cells, ]  

But since you do extract with "method=bilinear" that may not apply. In that case it could still be useful to do only extract the values for the unique coordinates, and then merge the results with the original points.