In our daily ETL load, we are loading a year back worth of history and that window changes every day to be one year back from the MAX(Incremental Datetime) value. Sometimes there are records in the fact data that are updated historically and we will pick up that change and reprocess.
Additionally, one of the related dimensions is a Type 2 dim and changes occur pretty frequently. A change to the Type 2 dimension record may have occurred in between when we originally pulled the fact data and when we reprocessed, causing a new key for that specific dimension record/natural key where IsCurrent = 1.
We MERGE the fact data from an integration layer to a presentation layer on the surrogate keys from the related dims. However, since a Type 2 change has occurred we now have a new key for the related dim and our MERGE detects this as a new record instead of seeing that it already exists. Now we have duplicate records in the fact table.
I'm thinking we could resolve this issue by doing the following:
- Add the natural key of the dimension record to the fact table and change the MERGE to use the non-type 2 keys and the natural key of this table as what makes it unique for the MERGE.
- Change the Type 2 key on the fact MERGE to a Type 0 (essentially), where it's inserted if it's detected as a new record but the value is never updated.
There may be other options here but this is what I thought of first to handle this issue. I think this is a pretty common use case, I'm just wondering what the best practice approach would be. Ideally we wouldn't be reprocessing fact data from a year back, but that's the way it goes here.