Data Migration with DMS: How to update primary keys in target when the update happened in source

31 Views Asked by At

We are using DMS to migrate the data from On-prem to database to S3(Landing). Then we are using glue for transformation, after that it will be loaded in another S3(Raw).

Incremental load will happen based on DMS flag(I, U, D).

When there is a update in composite primary keys, DMS flag will be U. but after join it will be inserted to target instead of update.

For Example,

Id phone number name
1 888888888888 x
2 999999999999999 y

In the above table Id, Phone number are primary keys. after migration target also will have same records.

When there is update happened in source phone number from 888888888888 to 777777777

DMS will capture a CDC data like below

Op Id phone number name
U 1 777777777 x

Based in Op column we can understand that there is updated happened in the source. Normally we will use Primary Key column for join and make changes in other column. But in this case update happened in Primary key column.

When I try to run the script based on Primary it is getting inserted as new row, instead of Updating the existing row. After that I am getting 3 rows but in the source I am having only 2 rows.

Id phone number name
1 888888888888 x
2 999999999999999 y
1 777777777 x

Ideally I should have only 2 rows in the target but I am getting 3. How can we handle this scenario.

We are using DMS to migrate the data from On-prem to database to S3(Landing). Then we are using glue for transformation, after that it will be loaded in another S3(Raw).

Incremental load will happen based on DMS flag(I, U, D).

When there is a update in composite primary keys, DMS flag will be U. but after join it will be inserted to target instead of update.

For Example,

Id phone number name
1 888888888888 x
2 999999999999999 y

In the above table Id, Phone number is primary keys. This is the data in source after migration target also will have same records.

When there is update happened in source phone number from 888888888888 to 777777777

DMS will capture a CDC data like below

Op Id phone number name
U 1 777777777 x

Based in Op column we can understand that there is updated happened in the source. Normally we will use Primary Key column for join and make changes in other column. But in this case update happened in Primary key column.

When I try to run the script based on Primary it is getting inserted as new row, instead of Updating the existing row. After that I am getting 3 rows but in the source I am having only 2 rows.

Id phone number name
1 888888888888 x
2 999999999999999 y
1 777777777 x

Ideally I should have only 2 rows in the target but I am getting 3. How can we handle this scenario.

0

There are 0 best solutions below