I currently have one Microsoft SQL Server database (Enterprise) running on a VM with 32 GB RAM.
We take data from multiple sources, roughly 500k rows per day is imported into a table. These individual transactional rows are then used to rebuild several aggregated tables.
The transactional database table contains over 30 million rows. Every night after the latest rows are imported, we will drop and then rebuild one of the (history) tables which contains another 30 million rows, before making use of this table to then build several other tables. It will drop the table, rebuild using a SELECT ... INTO and then create non-clustered indexes.
This takes > 1 hour and fries the VM and uses little CPU but all of the available memory.
I have tried rewriting the lengthy stored procedure to not rebuild the tables, and only use MERGE statements to insert or update only the records where a transaction has been received overnight. This is no quicker (and I have seen lots of documentation suggesting MERGE uses more resource, reads, writes, etc).
I am now at a loss on how to improve the performance of this after using Live Query Statistics to implement suggested indexes.
I have been looking at partitioning the large table > 30 million rows and increasing the RAM to 64 GB.
When I run query statistics there are no other suggestions however it is extremely slow and sits on query progress of 100% for a long time despite the activity monitor suggesting that it has already moved onto the next query.
30% query cost relative to the batch was associated to inserting the index on the 30m row (history) table.
Are there any other suggestions?
Should I go back to using MERGE statements and only updating unique records that have been received overnight (rather than rebuilding the whole fact table)?
Any advice would be greatly received.
Improve performance of SQL fact table building following ETL process overnight involving > 30 million rows.