The import speed of ways is super slow with osm2pgsql. I need tipps how to speed up the import speed.
Server Setup
6 Core Xenon
20 GB Ram
1400 GB SAS
Speedtest of my SAS-drive
hdparm -tT /dev/sda2
/dev/sda2:
Timing cached reads: 7492.82 MB/sec
SG_IO: 240.88 MB/sec
Import script
osm2pgsql --slim -r pbf -S mapfeatures_export.style -C 14000 -l --number-processes 5 -U osmdata --create --multi-geometry -d osmdatabase planet-190603.osm.pbf
osm2pgsql console output during run
osm2pgsql version 0.96.0 (64 bit id space)
Using built-in tag processing pipeline
Using projection SRS 4326 (Latlong)
Setting up table: planet_osm_point
Setting up table: planet_osm_line
Setting up table: planet_osm_polygon
Setting up table: planet_osm_roads
Allocating memory for dense node cache
Allocating dense node cache in one big chunk
Allocating memory for sparse node cache
Sharing dense sparse
Node-cache: cache=14000MB, maxblocks=224000*65536, allocation method=11
Mid: pgsql, cache=14000
Setting up table: planet_osm_nodes
Setting up table: planet_osm_ways
Setting up table: planet_osm_rels
Reading in file: /home/osmdata/planet-190603.osm.pbf
Using PBF parser.
Processing: Node(5234820k 342.0k/s) Way(15203k 0.20k/s) Relation(0 0.00/s)
I tested a SSD-setup, where the way-import-speed was 50k/s, but it is too expensive.
I followed the optimising tool chain from https://www.geofabrik.de/media/2012-09-08-osm2pgsql-performance.pdf
Hope, there are some additional options to tweak.
Direct answer: Try to lower your -C to around 4096, you probably don't have enough memory for postgres and the osm2pgsql process. In additional, try '--flat-nodes /tmp/mycache.bin' to speed up processing with direct local stores instead of using the database. Might also try --unlogged but read the docs to make sure you are okay with the consequences.
Other thoughts if you are in a cloud environment, use a higher performance machine for loading like aws r5a.2xlarge, then drop down to something closer to t3.medium for serving content. In the cloud you can dedicate machines for different purposes or even use container hosts like fargate for processing and pay only while running the load or updates.
My current config on a 64G memory machine:
Increased to a machine with a 7.2T nvme ssd drive and 12cpu's 96G memory
r5a.2xlarge
Other thoughts: These settings could surely be tuned for better performance but for a quick setup this seems to perform well and is cheap as long as you don't leave the machines at these sizes during normal operations after loading. The above architecture only costs about $8 to load north america so hardly worth the labor to optimize further.
Next step thoughts for increasing speed if running postgres locally:
RDS Commands for anyone interested from https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appendix.PostgreSQL.CommonDBATasks.html#Appendix.PostgreSQL.CommonDBATasks.PostGIS