I'm attempting to re-create Google Analytics "Last Non-Direct Click" marketing attribution methodology on top of Amplitude data set. In basic terms, this means if a user has a session where the channel is direct, it looks backwards in time and finds the most recent non-direct channel value.
I have a table that looks roughly like this. I also have the timestamp of the session start. Session ID is in order with which the session occurred, ASC.
| Session ID | Marketing Channel |
|---|---|
| 1 | direct |
| 2 | paid |
| 3 | direct |
| 4 | organic search |
| 5 | direct |
I want to create a third column where the values would look like this
| Session ID | Marketing Channel | Last Non-Direct Click Channel |
|---|---|---|
| 1 | direct | direct |
| 2 | paid | paid |
| 3 | direct | paid |
| 4 | organic search | organic search |
| 5 | direct | organic search |
| 6 | direct | organic search |
| 7 | direct | organic search |
I need to take any value in the Channel Tactic column where = to "Direct" and replace it with the previous Channel Tactic value that is not <> direct. It has to be the most recent one historically, it cannot be just any that is not direct from the past. So in the above example, row 5 "direct" cannot be either paid or organic search, it must be organic search.
I was thinking of leveraging a window function join but I can't think of way to make sure I'm only grabbing the most recent non-direct channel, not just any random non-direct channel. Keep in mind my data set is huge and most users have many sessions with multiple different marketing channels.
I'm looking for a solution that will be compatible in Snowflake, thank you!
Edit: I thought of a way to implement this as a single window function without self-joining.
This works because for the purposes of the
lagfunction, we're treating the worddirectas null using theifffunction. Thecasestatement indicates that if it's the first row, use the CHANNEL even if it isdirect. From then on, if the CHANNEL isdirect, lag back to the previous non-null (nondirectusing the iff function) value.