I am trying to extract a selection of multipolygons with tags related to green spaces from the geofabrik_europe-latest.osm.pbf file using the oe_get() function. The operation finishes and gives me a .gpkg, but also shows an error message: Error in sf::gdal_utils(util = "vectortranslate", source = normalizePath(file_path), : gdal_utils vectortranslate: an error occured In addition: There were 50 or more warnings (use warnings() to see the first 50) The additional 50 or more warnings all relate to: "In CPL_gdalvectortranslate(source, destination, options, ... : GDAL Message 1: Non closed ring detected. To avoid accepting it, set the OGR_GEOMETRY_ACCEPT_UNCLOSED_RING configuration option to NO".
The amount of features extracted seems very low (about 800k; while a test run with the Austria.pbf with the same parameters already yielded about 600k and visual inspection of both results shows there are a lot of polygons missing in the Europe extract).
What could be the reason for the incomplete extract? The extraction finishes despite the error message, but could it be that the error influences the amount of features extracted? Or is there a limit in features to be extracted?
My code: (this takes more than 6 hours to extract and requires an already downloaded .pbf file)
library(osmextract)
poly_amenities_green_low_vectortranslate = c(
"-t_srs", "EPSG:3035",
"-select", "osm_id, name, leisure, landuse, natural, access, garden_type",
"-where", "(leisure IN ('park', 'nature_reserve') AND access NOT IN ('private', 'no', 'customers', 'permit', 'license', 'restricted', 'agricultural', 'forestry')) OR (leisure = 'garden' AND access NOT IN ('private', 'no', 'customers', 'permit', 'license', 'restricted', 'agricultural', 'forestry') AND landuse NOT IN ('allotments', 'residential') AND garden_type IN ('community', 'botanical', 'public', 'flowerbed', 'municipal', 'street_side')) OR (landuse IN ('grass', 'forest', 'meadow', 'flowerbed', 'village_green') AND access NOT IN ('private', 'no', 'customers', 'permit', 'license', 'restricted', 'agricultural', 'forestry')) OR (natural IN ('beach', 'heath', 'wood', 'fell', 'grassland', 'scrub', 'tundra') AND access NOT IN ('private', 'no', 'customers', 'permit', 'license', 'restricted', 'agricultural', 'forestry'))"
)
oe_get("Europe", layer = "multipolygons", provider = "geofabrik", force_download = FALSE, vectortranslate_options = poly_amenities_green_low_vectortranslate, extra_tags = c("leisure", "landuse", "access", "natural", "garden:type"), download_only = TRUE, skip_vectortranslate = FALSE, never_skip_vectortranslate = TRUE)
I tried to set geometry type to GEOMETRY instead of the standard MULTIPOLYGON to avoid getting stuck due to invalid Polygons by specifying "-nlt", "GEOMETRY" - but this yields the same amount of features (just more invalid features with geometries empty).
I do not know where and how to set the OGR_GEOMETRY_ACCEPT_UNCLOSED_RING configuration option to NO in osmextract / vectortranslate query (and if this would help with extracting all features I want)
As it might be relevant to other users, I post the answer to my own question:
Problem identification:
First, to get to the base of my problem, I had to increase the number of error messages to be displayed by putting this before my
oe_get()command.This revealed that the real issue is not the
OGR_GEOMETRY_ACCEPT_UNCLOSED_RING configuration option(which is just a warning and can be left at default).The issue was rather
GDAL Error 1: failed to execute insert : database or disk is full. This was curious, because I had 70 gigabytes of disk space left and the (incompletely) extracted GPKG had less than 1 gigabyte. It turns out that gdal ogr2ogr writes huge temporary files in theC:\Users\xxx\AppData\Local\Tempfolder, which are automatically deleted after the GPKG writing process is completed. In my case, temporary files made up 70 gigabytes, while the final 'product' GPKG had only 300 Megabytes.Solution: I set my
download_directory()to an external ssd so that the vectortranslate operation creates a gpkg on this ssd drive (which increases processing speed due to faster transaction times). Additionally, as written here: https://gdal.org/drivers/vector/osm.html#vector-osm aCPL_TMPDIR configuration optionhas to be defined, asWhile setting a
download_directory()with osmextract is not enough to define acurrent directoryfor gdal ogr2ogr, puttingSys.setenv(CPL_TMPDIR="D:\\23-02-16_OSM_Geofabrik_Europe")before myoe_get()command worked - in this case I just defined myoe_download_directory()also asCPL_TMPDIR.As a result, the extraction and creation process of the GPKG finish just fine. All the intermediate osm_tmp_nodes_xxx files (which are automatically deleted after completion of the process) and the final .gpkg output are all written to the directory I have set.