osmextract query yields vectortranslate error and seemingly incomplete extraction of features

479 Views Asked by At

I am trying to extract a selection of multipolygons with tags related to green spaces from the geofabrik_europe-latest.osm.pbf file using the oe_get() function. The operation finishes and gives me a .gpkg, but also shows an error message: Error in sf::gdal_utils(util = "vectortranslate", source = normalizePath(file_path), : gdal_utils vectortranslate: an error occured In addition: There were 50 or more warnings (use warnings() to see the first 50) The additional 50 or more warnings all relate to: "In CPL_gdalvectortranslate(source, destination, options, ... : GDAL Message 1: Non closed ring detected. To avoid accepting it, set the OGR_GEOMETRY_ACCEPT_UNCLOSED_RING configuration option to NO".

The amount of features extracted seems very low (about 800k; while a test run with the Austria.pbf with the same parameters already yielded about 600k and visual inspection of both results shows there are a lot of polygons missing in the Europe extract).

What could be the reason for the incomplete extract? The extraction finishes despite the error message, but could it be that the error influences the amount of features extracted? Or is there a limit in features to be extracted?

My code: (this takes more than 6 hours to extract and requires an already downloaded .pbf file)

library(osmextract)
poly_amenities_green_low_vectortranslate = c(
  "-t_srs", "EPSG:3035",
  "-select", "osm_id, name, leisure, landuse, natural, access, garden_type", 
  "-where", "(leisure IN ('park', 'nature_reserve') AND access NOT IN ('private', 'no', 'customers', 'permit', 'license', 'restricted', 'agricultural', 'forestry')) OR (leisure = 'garden' AND access NOT IN ('private', 'no', 'customers', 'permit', 'license', 'restricted', 'agricultural', 'forestry') AND landuse NOT IN ('allotments', 'residential') AND garden_type IN ('community', 'botanical', 'public', 'flowerbed', 'municipal', 'street_side')) OR (landuse IN ('grass', 'forest', 'meadow', 'flowerbed', 'village_green') AND access NOT IN ('private', 'no', 'customers', 'permit', 'license', 'restricted', 'agricultural', 'forestry')) OR (natural IN ('beach', 'heath', 'wood', 'fell', 'grassland', 'scrub', 'tundra') AND access NOT IN ('private', 'no', 'customers', 'permit', 'license', 'restricted', 'agricultural', 'forestry'))"
)
oe_get("Europe", layer = "multipolygons", provider = "geofabrik", force_download = FALSE, vectortranslate_options = poly_amenities_green_low_vectortranslate, extra_tags = c("leisure", "landuse", "access", "natural", "garden:type"), download_only = TRUE, skip_vectortranslate = FALSE, never_skip_vectortranslate = TRUE)    

I tried to set geometry type to GEOMETRY instead of the standard MULTIPOLYGON to avoid getting stuck due to invalid Polygons by specifying "-nlt", "GEOMETRY" - but this yields the same amount of features (just more invalid features with geometries empty). I do not know where and how to set the OGR_GEOMETRY_ACCEPT_UNCLOSED_RING configuration option to NO in osmextract / vectortranslate query (and if this would help with extracting all features I want)

1

There are 1 best solutions below

0
JiannisK On

As it might be relevant to other users, I post the answer to my own question:

Problem identification:

First, to get to the base of my problem, I had to increase the number of error messages to be displayed by putting this before my oe_get() command.

options(nwarnings = 10000)

This revealed that the real issue is not the OGR_GEOMETRY_ACCEPT_UNCLOSED_RING configuration option (which is just a warning and can be left at default).

The issue was rather GDAL Error 1: failed to execute insert : database or disk is full. This was curious, because I had 70 gigabytes of disk space left and the (incompletely) extracted GPKG had less than 1 gigabyte. It turns out that gdal ogr2ogr writes huge temporary files in the C:\Users\xxx\AppData\Local\Temp folder, which are automatically deleted after the GPKG writing process is completed. In my case, temporary files made up 70 gigabytes, while the final 'product' GPKG had only 300 Megabytes.

Solution: I set my download_directory() to an external ssd so that the vectortranslate operation creates a gpkg on this ssd drive (which increases processing speed due to faster transaction times). Additionally, as written here: https://gdal.org/drivers/vector/osm.html#vector-osm a CPL_TMPDIR configuration option has to be defined, as

"The driver will use an internal SQLite database to resolve geometries. If that database remains under 100 MB it will reside in RAM. If it grows above, it will be written in a temporary file on disk. By default, this file will be written in the current directory, unless you define the CPL_TMPDIR configuration option."

While setting a download_directory() with osmextract is not enough to define a current directory for gdal ogr2ogr, putting Sys.setenv(CPL_TMPDIR="D:\\23-02-16_OSM_Geofabrik_Europe") before my oe_get() command worked - in this case I just defined my oe_download_directory() also as CPL_TMPDIR.

As a result, the extraction and creation process of the GPKG finish just fine. All the intermediate osm_tmp_nodes_xxx files (which are automatically deleted after completion of the process) and the final .gpkg output are all written to the directory I have set.