Incremental pipeline in google cloud data fusion

98 Views Asked by Akshay Shah At 24 August 2023 at 10:57

We have requirement to create incremental pipeline for BigQuery datasets. How can we create incremental pipeline in google cloude data fusion?

FYI: We have google csv file as data source.

Also want to check records exists in to BigQuery table before insert in google cloud data fusion pipeline.

Original Q&A

There are 1 best solutions below

user3126412 On 28 August 2023 at 05:22

One option is partition your input files by date/time and store each partition in a separate directory. Then set the input file path in your File source -> BigQuery sink pipeline as a macro parameter, and pass in the file path as a runtime argument.

For example, you might have a file structure like the following:

input_files/2023-08-26/
                       file1.csv, file2.csv,...
input_files/2023-08-27/
                       fileA.csv, fileB.csv,...

For the first run, pass input_files/2023-08-26 as runtime argument for the file path argument. On the next run pass input_files/2023-08-27, and so on.

Incremental pipeline in google cloud data fusion

There are 1 best solutions below

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-BIGQUERY

Related Questions in PIPELINE

Related Questions in GOOGLE-CLOUD-DATA-FUSION

Trending Questions

Popular # Hahtags

Popular Questions