Incremental pipeline in google cloud data fusion

98 Views Asked by At

We have requirement to create incremental pipeline for BigQuery datasets. How can we create incremental pipeline in google cloude data fusion?

FYI: We have google csv file as data source.

Also want to check records exists in to BigQuery table before insert in google cloud data fusion pipeline.

1

There are 1 best solutions below

0
user3126412 On

One option is partition your input files by date/time and store each partition in a separate directory. Then set the input file path in your File source -> BigQuery sink pipeline as a macro parameter, and pass in the file path as a runtime argument.

For example, you might have a file structure like the following:

input_files/2023-08-26/
                       file1.csv, file2.csv,...
input_files/2023-08-27/
                       fileA.csv, fileB.csv,...

For the first run, pass input_files/2023-08-26 as runtime argument for the file path argument. On the next run pass input_files/2023-08-27, and so on.