I am using Azure SQL for an Application DB, and am trying to create (and subsequently kep updated) some lightweight tables from the more complete tables the Application uses.
Main reason for this is that these tables will then be sync'd into a reporting DB using Azure Data Sync, which lets you be selective about the tables and columns you sync, but not the rows, so without having the smaller tables to use for sync, 99% of the data sync'd was superfluous to requirements.
I have been able to create the tables initially using SELECT INTO (cut down example of one below):
select * INTO min_users
from users where
id_object in (select id_user_create from min_all_tasks)
But as there is decades worth of data to deal with, this takes a LONG time, so just dropping the table and recreating it every time is not desirable.
I'm looking for a method to be able to incrementally update the tables once built by recording when this was last executed and selecting only rows have have been created or updated since the <date_last_executed>. e.g.
select * INTO min_users
from users where
date_arrive > <date_last_executed> AND
id_object in (select id_user_create from min_all_tasks)
The key goal here is for the subsequent updates to be faster and insert rows if they don't already exists (according to the Primary Key) or updating them if they do, and (obviously) to not have duplicate rows as a result.
is there a relatively straightforward method to achieve this?
Everything I have been able to find just gives me tutorials on using SELECT INTO, or tells me how to perform an UPDATE to specific rows, but not an efficient way of Replacing a row that already exists.
It seems the REPLACE function in MySQL would do what I want, but I can't find any equivalent that will work in Azure SQL (REPLACE in SQL seems like it is more aimed at Find and Replace functions.
To achieve your key goal as insert rows if they don't already exist (according to the Primary Key) or updating them if they do, and (obviously) to not have duplicate rows as a result. In SQL, you may utilize the MERGE command. In order to efficiently synchronize data between tables, you may use the
MERGEcommand, which combines insert and update actions into a single statement.But for that you need to first create table with similar schema.
Example Query:
My execution:
Output: