I have some long-standing package code that uses raster::rasterize that I'm trying to update to terra::rasterize. The code takes point data, where each point has one of two possible integer ID values. The output is a raster with two layers, one for each possible point ID, where cell values are counts. The relevant bits are:
# r0 is template raster to define extent and resolution
r <- raster::rasterize(dat[, c("X", "Y")],
r0,
field = dat$flightlineID,
fun = f,
background = 0)
Here, f is a function that takes a vector of point IDs and returns a two-element vector of counts, which results in the desired two layer output raster.
My first attempt to port this to terra::rasterize (package version 1.6-17) was...
r <- terra::rasterize(cbind(dat$X, dat$Y), # seem to need a matrix rather than a data frame
r0, # template SpatRaster
values = dat$flightlineID,
fun = f,
background = 0)
This fails with the error:
Error in w[vv[, 1], ] <- vv[, -1] : number of items to replace is not a multiple of replacement length
Delving into the code for terra:::rasterize_points it seems that the number of layers for the output raster is determined by treating the 'values' argument as a data frame and checking the number of columns. This is a bit confusing because the package docs state that the values argument is expected to be a numeric vector, of either length 1 or nrow(x) where x is the input point data. Moreover, the length of the vector returned by the user-supplied summary function doesn't seem to play any part in determining the number of output raster layers.
For the moment I've simply retained the old raster::rasterize code and convert the output raster to a SpatRaster, but I think I must be missing something obvious. Is there a way of using just terra::rasterize to accomplish this task?
EDIT: As requested in comments, here is a small sample of the input point data to show the format. Typical input data sizes range from 2 to 40 million points.
structure(list(X = c(420094, 420067, 420017, 420050, 420058,
420090, 420038, 420040, 420081, 420097, 420075, 420041, 420039,
420062, 420050, 420083, 420019, 420019, 420044, 420087, 420099,
420077, 420030, 420014, 420015, 420051, 420033, 420056, 420041,
420030, 420027, 420024, 420058, 420042, 420063, 420028, 420073,
420053, 420010, 420100, 420048, 420062, 420056, 420080, 420053,
420068, 420074, 420004, 420010, 420078), Y = c(6676049, 6676029,
6676034, 6676019, 6676096, 6676010, 6676003, 6676048, 6676073,
6676023, 6676089, 6676082, 6676010, 6676051, 6676039, 6676099,
6676024, 6676073, 6676040, 6676056, 6676072, 6676086, 6676030,
6676042, 6676002, 6676033, 6676078, 6676073, 6676013, 6676056,
6676055, 6676069, 6676072, 6676089, 6676069, 6676058, 6676023,
6676039, 6676043, 6676017, 6676011, 6676054, 6676095, 6676068,
6676098, 6676077, 6676049, 6676073, 6676097, 6676057), flightlineID = c(2L,
1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
2L)), row.names = c(NA, -50L), class = "data.frame")
EDIT: In the raster package code, the private .pointsToRaster function has a line (see here) where the length of the output from the user-supplied summary function is checked with some arbitrary test values to determine the number of layers in the output raster. This seems to be absent from the terra package code.
It may be that you don't want this as two layers in one raster, though this is hard to tell with the supplied data as it appears to be all 'within' the overlap. I notice in you package, there is an attempt to throttle/reduce tile edge points that maybe just needs to be set lower than 1K.
That
terradoesn't work the same asrasterwhenrasterize(ing may be a decision that underterraone should intend two layers via making each thenadd<-ing or<- c(ing, whereas withrasterit was assumed via a hard to follow logic of 'field' and 'values'. Using your above data (and keeping two rasters):One might suggest a consistent
mergepolicy where pts1 or pts2 are always favored. In the end, if this is about optimizing allocation of scarce resources, clear bush where you have the best data, inspect, and clear again. But it still seems best to resolve this at thelaslevel upstream.