Is there SQL Logic to reduce type 2 table along a dimension

131 Views Asked by At

I have a slowly changing type 2 price change table which I need to reduce the size of to improve performance. Often rows are written to the table even if no price change occurred (when some other dimensional field changed) and the result is that for any product the table could be 3-10x the size it needs to be if it were including only changes in price.

I'd like to compress the table so that it only has contains the first effective date and last expiration date for each price until that price changes that can also

  • Deal with an unknown number of rows of the same price
  • Deal with products going back to an old price

As an example if i have this raw data:

Product Price Effective Date Price Expiration Date Price
123456 6/22/18 9/19/18 120
123456 9/20/18 11/8/18 120
123456 11/9/18 11/29/18 120
123456 11/30/18 12/6/18 120
123456 12/7/18 12/19/18 85
123456 12/20/18 1/1/19 85
123456 1/2/19 2/19/19 85
123456 2/20/19 2/20/19 120
123456 2/21/19 3/19/19 85
123456 3/20/19 5/22/19 85
123456 5/23/19 10/10/19 85
123456 10/11/19 6/19/19 80
123456 6/20/20 12/31/99 80

I need to transform it into this:

Product Price Effective Date Price Expiration Date Price
123456 6/22/18 12/6/18 120
123456 12/7/18 2/19/19 85
123456 2/20/19 2/20/19 120
123456 2/21/19 10/10/19 85
123456 10/11/19 12/31/99 80
4

There are 4 best solutions below

1
Gordon Linoff On BEST ANSWER

This is a type of gaps-and-islands problem. I would recommend reconstructing the data, saving it in a temporary table, and then reloading the existing table.

The code to reconstruct the data is:

select product, price, min(effective_date), max(expiration_date)
from (select t.*,
             sum(case when prev_expiration_date = effective_date - interval '1 day' then 0 else 1 end) over (partition by product order by effective_date) as grp
      from (select t.*,
                   lag(expiration_date) over (partition by product, price order by effective_date) as prev_expiration_date
            from t
           ) t
     ) t
group by product, price, grp;

Note that the logic for date arithmetic varies depending on the database.

Save this result into a temporary table, temp_t or whatever, using select into, create table as, or whatever your database supports.

Then empty the current table and reload it:

truncate table t;

insert into t
    select product, price, effective_date, expiration_date
    from temp_t;

Notes:

  • Validate the data before using truncate_table!
  • If there are triggers or columns with default values, you might want to be careful.
5
user177933 On

It sounds like you are asking for a temporal schema? Where for a given date you can know the price of an asset?

This is done with two tables; price_current and price_history.

price_id item_id price rec_created
1 1 100 '2015-04-18'
price_id item_id from to price
1 1 '2001-01-01' '2004-05-01' 114
1 1 '2004-05-01' '2015-04-18' 102

i.e. for any item, you can ascertain the date it was set without polluting your "current" table. For this to work effectively you will need to have UPDATE triggers on your current_table. When you update a record you insert into the history table the details and the period it was valid from.

CREATE OR REPLACE TRIGGER trg_price_current_update
AS
BEGIN
    INSERT INTO price_history(price_id, item_id, from, to, price)
    SELECT price_id, item_id, rec_created, GETDATE(), price
    FROM rows_updated
END

Now you have a distinction between current and historical, without your current table (presumably the busier table) getting out of hand because of maintaining historical state. Hope i understood the question.

To ignore 'dummy' updates, just alter the trigger to ignore empty changes (if that's not handled by the DBMS anyway). Tbh, this should and could be done application side easily enough, but to manage it via the trigger:

CREATE OR REPLACE TRIGGER trg_price_current_update
AS
BEGIN
    INSERT INTO price_history(price_id, item_id, from, to, price)
    SELECT price_id, item_id, rec_created, GETDATE(), price
    FROM rows_updated u
    INNER JOIN price_current ON u.price_id = p.price_id
    WHERE u.price <> p.price
END

i.e. rows_updated contains the record from the update, we insert into the history table the previous row, providing the previous row's price is different from the current row's price.

(edited to include new trigger. I also changed the date held in rec_created, this must be the date the row is created, not the first instance that product had a price assigned to it. that was a mistake. Regarding the dates, I am lazy to put the full DD-MM-YYYY hh:mm:ss:zzz, but that would generally be useful in between queries)

1
user177933 On

something along the lines of:

WITH keeps AS
(
SELECT p.product_id, p.effective, p.expires, p.price, CASE WHEN EXISTS(SELECT 1 FROM prices p1 WHERE p1.effective = DATEADD(DAY, p.exires, 1) AND p1.price <> p.price) THEN 1 ELSE 0 END AS has_after, CASE WHEN EXISTS(SELECT 1 FROM prices p1 WHERE p1.expires = DATEADD(DAY, p.effective, -1) AND p1.price <> p.price) THEN 1 ELSE 0 END AS has_before
FROM prices p
)
SELECT * FROM keeps
WHERE has_after = 1
OR has_before = 1
UNION ALL
SELECT p.product_id, p.effective, p.exires, p.price
FROM prices p
WHERE p.effective = (SELECT MIN(effective) FROM prices p1 WHERE p1.product_id = p.product_id)

What's it doing:

Find all the entries where there exists another entry whose effective date is that of the previous entry's expiry date + 1, and the price of that new entry is different. This gives us all the actual changes in price. But we miss the first price entry, so we simply include that in the results.

e.g.:

product_id effective expires price has_before has_after
123456 6/22/18 9/19/18 120 0 0
123456 9/20/18 11/8/18 120 0 0
123456 11/9/18 11/29/18 120 0 0
123456 11/30/18 12/6/18 120 0 1
123456 12/7/18 12/19/18 85 1 0
123456 12/20/18 1/1/19 85 0 0
123456 2/1/19 2/19/19 85 0 1
123456 2/20/19 2/20/19 120 1 1
123456 2/21/19 3/19/19 85 1 0
0
Ajax1234 On

You can first find the intervals where the price does not change, and then group on those intervals:

with to_r as (select row_number() over (order by (select 1)) r, t.* from data_table t),
     to_group as (select t.*, (select sum(t1.r < t.r and t1.price != t.price) from to_r t1) c from to_r t)
select t.product, min(t.effective), max(t.expiration), max(t.price) from to_group t group by t.c order by t.r;

Output:

Product Price Effective Date Price Expiration Date Price
123456 6/22/18 12/6/18 120
123456 12/7/18 2/19/19 85
123456 2/20/19 2/20/19 120
123456 2/21/19 10/10/19 85
123456 10/11/19 12/31/99 80