how to create a azure data warehouse using parquet files incrementally

729 Views Asked by Bilal Edelbi At 22 September 2022 at 10:07

I am working on a project involving incremental loading data I need to implement an azure data warehouse in the following specifications:

example situation: I have 2 parquet files having the same structure, one of them is in the data lake and the other is already loaded to a table in a dedicated SQL-pool.

what steps should i go through to end up with a table that merges the 2 files (updating existing columns using a specific id and inserting new column when not found)

I would prefer not using external tables as they are slower in performance

Original Q&A

There are 2 best solutions below

fbglv On 26 September 2022 at 14:31

The target table has to be the same table in which the 2nd parquet file has already been loaded?

You could in any case define a simple Synapse pipeline in which you read both the parquet and the table from the dedicated sql pool, merge the two data flows, and the sink the result to the target sql table by means of an upsert

References:

[https://learn.microsoft.com/en-us/azure/data-factory/data-flow-alter-row#merges-and-upserts-with-azure-sql-database-and-azure-synapse][1]
[https://www.taygan.co/blog/2018/04/20/upsert-to-azure-sql-db-with-azure-data-factory]

Aniket Kumar On 27 September 2022 at 05:54

First of all, you can not update a parquet file without overwriting existing data which means you can not update rows or insert new records directly, but there is a way around.

You need to create layers of parquet files like,

parquet files which contain your historical or old record.
parquet files which contain your incremental record.
parquet files created after merging the above two Once you have these parquet files you overwrite historical parquet file with merged parquet file , and for every iteration you update incremental file.

To merge these parquet files you can either use inbuilt tools of azure synapse copy activity or can convert parquet files in data frames, merge those data frames and create a parquet files from merged dataframes

how to create a azure data warehouse using parquet files incrementally

There are 2 best solutions below

Related Questions in T-SQL

Related Questions in PARQUET

Related Questions in DATA-WAREHOUSE

Related Questions in AZURE-SYNAPSE

Related Questions in INCREMENTAL-BUILD

Trending Questions

Popular # Hahtags

Popular Questions