I have a slowly changing type 2 price change table which I need to reduce the size of to improve performance. Often rows are written to the table even if no price change occurred (when some other dimensional field changed) and the result is that for any product the table could be 3-10x the size it needs to be if it were including only changes in price.
I'd like to compress the table so that it only has contains the first effective date and last expiration date for each price until that price changes that can also
- Deal with an unknown number of rows of the same price
- Deal with products going back to an old price
As an example if i have this raw data:
| Product | Price Effective Date | Price Expiration Date | Price |
|---|---|---|---|
| 123456 | 6/22/18 | 9/19/18 | 120 |
| 123456 | 9/20/18 | 11/8/18 | 120 |
| 123456 | 11/9/18 | 11/29/18 | 120 |
| 123456 | 11/30/18 | 12/6/18 | 120 |
| 123456 | 12/7/18 | 12/19/18 | 85 |
| 123456 | 12/20/18 | 1/1/19 | 85 |
| 123456 | 1/2/19 | 2/19/19 | 85 |
| 123456 | 2/20/19 | 2/20/19 | 120 |
| 123456 | 2/21/19 | 3/19/19 | 85 |
| 123456 | 3/20/19 | 5/22/19 | 85 |
| 123456 | 5/23/19 | 10/10/19 | 85 |
| 123456 | 10/11/19 | 6/19/19 | 80 |
| 123456 | 6/20/20 | 12/31/99 | 80 |
I need to transform it into this:
| Product | Price Effective Date | Price Expiration Date | Price |
|---|---|---|---|
| 123456 | 6/22/18 | 12/6/18 | 120 |
| 123456 | 12/7/18 | 2/19/19 | 85 |
| 123456 | 2/20/19 | 2/20/19 | 120 |
| 123456 | 2/21/19 | 10/10/19 | 85 |
| 123456 | 10/11/19 | 12/31/99 | 80 |
This is a type of gaps-and-islands problem. I would recommend reconstructing the data, saving it in a temporary table, and then reloading the existing table.
The code to reconstruct the data is:
Note that the logic for date arithmetic varies depending on the database.
Save this result into a temporary table,
temp_tor whatever, usingselect into,create table as, or whatever your database supports.Then empty the current table and reload it:
Notes:
truncate_table!