How to use rgbif to download occurrence data from multiple polygons at once?

652 Views Asked by At

I am trying to download occurrence data of multiple taxonomic groups from multiple regions using the R package rgbif. I prefer to have one download by combining all the regions together (because I have thousands of regions, it would be insane to have separate downloads). However, I did not find a way to do so. I can do occ_download for one region per query.

Here are my exemplary code:

library(rgbif)
gbif_taxon_keys = c(212, 359)
# below are bbox of 3 regions;
# I have polygons as WKT, but they are clockwise 
# (how to convert to counter clockwise??)
wkts = c("POLYGON((11.3431 47.2451,11.4638 47.2451,11.4638 47.2919,11.3431 47.2919,11.3431 47.2451))",
"POLYGON((12.9644 47.7608,13.0922 47.7608,13.0922 47.8453,12.9644 47.8453,12.9644 47.7608))",
"POLYGON((14.2284 48.2217,14.3669 48.2217,14.3669 48.3443,14.2284 48.3443,14.2284 48.2217))")

# this works
queries = occ_download_prep(
  pred_in("taxonKey", gbif_taxon_keys),
  pred("hasCoordinate", TRUE),
  pred("hasGeospatialIssue", FALSE),
  pred_within(wkts[1]),
  user = gbif_user, pwd = gbif_pwd,
  email = gbif_email)
out_test = occ_download_queue(.list = list(queries))

# now try to combine regions in one download
# this does not work
queries = occ_download_prep(
  pred_in("taxonKey", gbif_taxon_keys),
  pred("hasCoordinate", TRUE),
  pred("hasGeospatialIssue", FALSE),
  pred_within(wkts),
  user = gbif_user, pwd = gbif_pwd,
  email = gbif_email)
out_test = occ_download_queue(.list = list(queries))
Error: 'value' must be length 1
# this does not work neither (it runs though)
queries = occ_download_prep(
  pred_in("taxonKey", gbif_taxon_keys),
  pred("hasCoordinate", TRUE),
  pred("hasGeospatialIssue", FALSE),
  pred("geometry", paste0(wkts, collapse = ";")),
  user = gbif_user, pwd = gbif_pwd,
  email = gbif_email)
out_test = occ_download_queue(.list = list(queries))
<<gbif download metadata>>
  Status: KILLED

From my download center on GBIF, it says "The download request was unsuccessful. ".

Can anyone help with this? Thanks!

2

There are 2 best solutions below

3
Daijiang Li On

I think I figured out how to do this. I just combined all polygons into a multipolygon and it seems works.

In another word, I just put the above 3 polygon into this:

wkts2 = "MULTIPOLYGON (((11.3431 47.2451, 11.4638 47.2451, 11.4638 47.2919, 11.3431 47.2919, 11.3431 47.2451)), ((12.9644 47.7608, 13.0922 47.7608, 13.0922 47.8453, 12.9644 47.8453, 12.9644 47.7608)), ((14.2284 48.2217, 14.3669 48.2217, 14.3669 48.3443, 14.2284 48.3443, 14.2284 48.2217)))"

then, I run:

queries = occ_download_prep(
  pred_in("taxonKey", gbif_taxon_keys),
  pred("hasCoordinate", TRUE),
  pred("hasGeospatialIssue", FALSE),
  pred_within(wkts2),
  user = gbif_user, pwd = gbif_pwd,
  email = gbif_email)
out_test = occ_download_queue(.list = list(queries))

It works for this example. @sckott may have better approaches.

2
MattBlissett On

Just concatenating the polygons into a multipolygon leads to overlaps, which aren't valid, and will lead to failing downloads.

Instead, use a GIS library to combine the polygons. This is the first I found for R:

library(sf)
x = st_as_sfc("POLYGON((5.032 52.237, 5.426 52.237, 5.426 52.525, 5.032 52.525, 5.032 52.237))")
y = st_as_sfc("POLYGON((5.234 52.033, 5.546 52.033, 5.546 52.311, 5.234 52.311, 5.234 52.033))")
u = st_union(x, y)

st_as_text(u)
[1] "POLYGON ((5.032 52.525, 5.426 52.525, 5.426 52.311, 5.546 52.311, 5.546 52.033, 5.234 52.033, 5.234 52.237, 5.032 52.237, 5.032 52.525))"

A quick check on Wicket shows we now have an 8-sided polygon, which should work as the within predicate.

I think, using this, you can probably put all your polygons into a single download. The limit is 10,000 points in total for a single download.