We have requirement to create incremental pipeline for BigQuery datasets. How can we create incremental pipeline in google cloude data fusion?
FYI: We have google csv file as data source.
Also want to check records exists in to BigQuery table before insert in google cloud data fusion pipeline.
One option is partition your input files by date/time and store each partition in a separate directory. Then set the input file path in your File source -> BigQuery sink pipeline as a macro parameter, and pass in the file path as a runtime argument.
For example, you might have a file structure like the following:
For the first run, pass
input_files/2023-08-26as runtime argument for the file path argument. On the next run passinput_files/2023-08-27, and so on.