I am working on a task where my source is AWS RDS - SQL Server and my target is Azure SQL Server.
There's a table with 80M records in my source that needs to be merged with my target table.
This merging will happen every 15 mins and based on the business key, I need to -
- Update the target table if the key is updated in the source table.
- Insert a new key into the target table.
- Mark
isDeletedastruein the target if the key is no more present in the source.
IMP Note - The source row is hard-deleted and no history is maintained.
Since this merging happens every 15 mins and the source table is pretty big, I use lastUpdated column to select only limited records in the source query of my merge query.
With this, I am able to perfectly handle the "upsert" scenario, but on delete, it is deleting all the records from the target which is not desirable.
I have tried the below option -
Read the entire source table in a temp_table every 15 mins and then perform merge from temp_table to the target table. But this is very costly in terms of processing and time.
Is there any better way to handle this scenario? I am happy to share more information as needed.
I think you can solve the problem by adding new column called
SyncStamp, the idea is, we update or insert the same value for SyncStamp, So the other rows that have not this value should be updated as IsDeleted.I prefer to get the actual timestamp for SyncStamp but you can choose random numbers.