In SSIS, if an incoming dataset has multiple records for the same Business Key, how do I load it to the dimensions table with SCD type 2 without using the SCD Wizard.
Sample dataset
Customer ID Name Segment Postal Code
1 James Corporate 50026
2 Andrew Consumer 33311
3 Steven Consumer 90025
2 Andrew Consumer 33306
3 Steven Consumer 90032
1 James Corporate 50087
3 Steven Consumer 90000
In my case, if I try Loading the dimension table with other SSIS components (Lookup/Conditional Split) all the record show up a new row in the table because they are all coming in all at the same time.
I have ‘CurrentFlag’ as the indicator of the current record.
In SSIS, if I have an incoming dataset that has multiple records for the same Business Key, How do I get to recognize these, and set the CurrentFlag as necessary, whether or not a record in the target table has that Business Key already?
Thanks.
OK, this is a massive simplification because SCD's are very challenging to correctly implement. You will need to sit down and think critically about this. My answer below only handles ongoing daily processing - it does not explain how to handle historical files being re-processed, which could potentially result in duplicate records with different EffectiveStart and End Dates.
By definition, you will have an existing record source component (i.e., query from the database table) and an incoming data source component (i.e., a *.csv flatfile). You will need to perform a merge join to identify new records versus existing records. For existing records, you will need to determine if any of the columns have changed (do this in a Derived Column transformation).
You will need to also include two columns for EffectiveStartDate and EffectiveEndDate.
Note on 12-31-9999: This is effectively the Y10K bug. But, it allows users to query the database between date ranges without having to consciously add ISNULL(GETDATE()) in the WHERE clause of a query in the event that they are querying between date ranges.
This will prevent the dates on the columns from overlapping, which could potentially result in multiple records being returned for a given date.
To determine if a record has changed, create a new column called RecordChangedInd of type Bit.
Then, in your split condition you can create two outputs:
RecordHasChanged(this will be an INSERT) andRecordHasNotChanged(this will be an UPDATE to deactivate the exiting record and an INSERT).You can conceivably route both inputs to the same INSERT destination. But, you will need to be careful suppress the update record's ExistingEffectiveEndDate value that deactivates the date.