How do I do a SQL join to get the latest data from table1 as of the date in table2?

562 Views Asked by At

I have two tables, call them "monthlyStoreCount" and "weeklySales"

monthlyStoreCount

date storeCount
2022-01-01 89
2022-02-01 94
... ...

weeklySales

date sales
2021-12-31 66
2022-01-07 16
2022-01-14 147
2022-01-21 185
2022-01-28 145
2022-04-04 2572
... ...

I am looking to join these tables to get the "storeCount" and latest "sales" as of the dates in the monthlyStoreCount table.

Is there any performant way to do this join? With the data shown the desired output would be:

date storeCount sales
2022-01-01 89 66
2022-02-01 94 145
... ... ...
2

There are 2 best solutions below

3
xQbert On

UNTESTED:

Using: https://docs.snowflake.com/en/sql-reference/constructs/join-lateral.html as a primer...

"for each row in left_hand_table LHT: execute right_hand_subquery RHS using the values from the current row in the LHT"

Lateral allows us to execute the sub query for each record in the Monthly Store Count. So we get the MSC record whose date is >= ws date. ordered by weekly sales date descending and get the 1st record (the one closest to the monthly store count date which is equal to or before that date.)

SELECT MSC.Date, MSC.StoreCount,  sWS.Sales
FROM monthlyStoreCount as MSC, 
     LATERAL (SELECT WS.Sales 
              FROM WeeklySales as WS 
              WHERE MSC.date>= WS.date 
              ORDER BY WS.Date DESC LIMIT 1) as sWS
ORDER BY MSC.Date ASC;
4
Kathmandude On

Instead of using a cartesian product, what if you stack them up and look for the date that occurs right before the date for monthly store counts?

with cte as
(select date, storeCount, 1 as is_monthly
 from monthlyStoreCount 
 union all
 select date, sales, 0 as is_monthly
 from weeklySales)
  

select *, lag(storeCount) over (order by date asc, is_monthly asc)
from cte
qualify is_monthly=1;

Hmm....It appears there is one way to make xQbert's lateral join solution work. By slapping an aggregate on it. I don't know why Snowflake doesn't allow the same using limit/top 1.

select *
from monthlyStoreCount as m,
     lateral (select array_agg(w.sales) within group(order by w.date desc)[0] as sales
              from WeeklySales as w 
              where m.date>= w.date)