How to use parallelization with raster::extract in R using furrr

191 Views Asked by At

I am unsure if this is a bug or a how-to. I posted this question here and was told to ask StackOverflow!

library(tidyverse)
library(tigris)
library(elevatr)
library(raster)
library(sf)
library(furrr)

multco <- tigris::tracts(state = "OR", 
    county = "Multnomah") %>% 
  st_transform(2913) %>% 
  st_point_on_surface()

ex_elev <- elevatr::get_elev_raster(
    locations = st_bbox(multco) %>% st_as_sfc(), 
    z = 5)

# This works
ev <- raster::extract(ex_elev, multco, 
    fun = mean, na.rm = T, buffer = 100)

## This fails
ev2 <- multco %>% 
  furrr::future_map_dbl(.f = function(point){
    raster::extract(ex_elev, point, fun = mean, na.rm = T, buffer = 100)}, 
             .options = furrr_options(seed = TRUE,
                                      packages = c("raster", "sf")))

with the following error code: Error in round(y) : non-numeric argument to mathematical function

It works with serial processing however.

I'm not sure if it's a {raster} issue or a {future} issue or a {furrr} issue. If anyone has luck using furrr-based parallelization and mapping with {raster} functions, please let me know!

Edit 1: Changed code to fully reproducible example.

1

There are 1 best solutions below

1
Elia On

As far as I know, rarely parallel extraction is needed. Often the overheads to pass the data to the workers are more expensive than computing the extraction in sequential mode. However, purrr::map and their parallel version use a list as an argument, so you have to convert your sf to a list. See my example with a little time benchmark of different approaches:

library(tidyverse)
library(tigris)
library(elevatr)
library(raster)
library(sf)
library(furrr)


# This works
system.time(ev <- raster::extract(ex_elev, multco, 
                     fun = mean, na.rm = T, buffer = 100))#51.84

system.time(ev <- terra::extract(ex_elev, multco, 
                      fun = mean, na.rm = T, buffer = 100))#57.2

system.time(ev <- exactextractr::exact_extract(ex_elev, st_buffer(multco,100), 
                      "mean"))#0.43



#in parallel
xy.list <- split(multco, seq(nrow(multco)))

plan(multisession)

system.time(ev2 <- xy.list %>% 
  furrr::future_map_dbl(.f = function(point){
    raster::extract(ex_elev, point, fun = mean, na.rm = T, buffer = 100)}, 
    .options = furrr_options(seed = TRUE,
                             packages = c("raster", "sf")))
)#208
plan(sequential)

in the comment of each approach, you will see the elapsed time (in seconds) on my machine (64 Gb RAM and 48 logical cores). As you can see, with your toy data, the exact_extract approach is by far the better