I have two delta live tables (events and sales). These are streaming tables having a common field (sale_id). I need to reconcile these tables to ensure that each sale_id in the events table has a match in the sales table or not. All the records from the events table that has no match in the sales table needs to be routed to an error table. The error table will then be used the next day in the reconciliation process.
So the process would look like.
select *, CASE
WHEN sales.sale_num IS NOT NULL OR error.sale_num IS NOT NULL THEN 1
ELSE 0
END AS zero_or_one from events
left outer join sales
on events.sale_id = sales.sale_id
left outer join error
on event.sale_id = error.sale_id
But there are two issues I am facing
- Left outer join is not possible between two stream sources unless a watermark column is provided. The datasets do not have proper watermark columns. I am planning to overcome this issue by doing a series of inner joins.
- The error table does not exist initially and this needs to be created as part of the pipeline. Can the error table be created initially in the DLT pipeline and then updated with the new error records identified as part of the reconciliation process. When I tried to do this, I got an error saying that the error table cannot be defined twice in the pipeline.
Could anyone please help me with suggestions on the issues I am facing.