Creating a datafactory pipeline for updating an insering records

51 Views Asked by At

I have one csv file which has data of 3 months and have a key column but it has empty values and I have a second file which has also three months records but one week after the records from the first file. They have overlapping records.

Now, I want to create a pipeline which find the new records of key columns in the second file and insert in the main database and also check the old records of the key columns where it is missi g and update it with the records from the new second file.

First file

study date report no pat no examname

23/11/2023 WD2451 1345 MRI HIP 25/11/2023 1359 MRI Shoulder 29/11/2023 1754 MRI HIP

Second File

study date report no pat no examname

23/11/2023 WD2451 1345 MRI HIP 01/12/2023 WD1983 1359 MRI Shoulder 04/12/2023 1754 MRI HIP

1

There are 1 best solutions below

2
Bhavani On

Here are two files I have used:

details1.csv:

study date report no pat no examname
23/11/2023 WD2451 1345 MRI HIP
25/11/2023 1359 MRI Shoulder
29/11/2023 1754 MRI HIP

details2.csv:

study date report no pat no examname
23/11/2023 WD2451 1345 MRI HIP
1/12/2023 WD1983 1359 MRI Shoulder
29/11/2023 WD19834 1754 MRI HIP

Copy the first file from the source to the SQL database using the copy activity and insert option.

To find the new records of key columns in the second file and insert them in the main database table, and also check the old records of the key columns where they are missing and update them with the records from the new second file, you can use the upsert option in sink and select study date as the key column, as shown below:

enter image description here

After debugging the pipeline, you will get the updated table according to the second file, as shown below:

enter image description here