I am working on a project involving incremental loading data I need to implement an azure data warehouse in the following specifications:
example situation: I have 2 parquet files having the same structure, one of them is in the data lake and the other is already loaded to a table in a dedicated SQL-pool.
what steps should i go through to end up with a table that merges the 2 files (updating existing columns using a specific id and inserting new column when not found)
I would prefer not using external tables as they are slower in performance
The target table has to be the same table in which the 2nd parquet file has already been loaded?
You could in any case define a simple Synapse pipeline in which you read both the parquet and the table from the dedicated sql pool, merge the two data flows, and the sink the result to the target sql table by means of an upsert
References: