Let assume I have two parquet file aka main and updates with some data as shown below. I need to implement upsert kind of operation in duck db.
parquet file main: id, name, city
data:
id name city
1 a p
2 b q
3 c r
parquet file updates: id, name, city
data:
id name city
1 a m
4 b q
desired output:-
id name city
1 a m <----update city
2 b q
3 c r
4 b q <----insert
currently I am doing the same using below queries:-
create table main as select * from '/tmp/main.parquet';
create table stage as select * from '/tmp/updates.parquet';
delete from main using stage where main.id=stage.id;
insert into main select * from stage;
COPY main TO '/tmp/final.parquet' (FORMAT 'PARQUET', CODEC 'ZSTD');
but only thing is create table will load and keep all the data in memory what I don't want as the main file may contain 8-10 million records, is there any way where I can achive the same using joins only and avoid creating main and stage tables.
DuckDB has upsert support, see the docs: https://duckdb.org/docs/sql/statements/insert#on-conflict-clause