LAG function to return previous rows by discarding null values

36 Views Asked by At

I am trying to find the previous pages visited by a person from web table. I am using lag function to find previous pages that the person has visited only if there was a next page visited.

web table:

id visit_time webpage_visited
1 2024-03-14 10:00:01 google.com
1 2024-03-14 10:00:07
1 2024-03-14 10:01:15
1 2024-03-14 10:01:10 espn.com
1 2024-03-14 10:02:01

When I use below SQL, the values returned are not considering the null/blank values between the rows.

SQL used:

select id, 
visit_time, 
webpage_visited, 
coalesce(lag(webpage_visited, 1) over (partition by id order by visit_time  asc), 'none') as previous_webpage_visited
from web

Please advice how I can achieve the below expected output.

Expected output:

id visit_time webpage_visited previous_webpage_visited
1 2024-03-14 10:00:01 google.com None
1 2024-03-14 10:00:07
1 2024-03-14 10:01:15
1 2024-03-14 10:01:10 espn.com google.com
1 2024-03-14 10:02:01
1

There are 1 best solutions below

10
Guru Stron On BEST ANSWER

The trick consists of two parts - use IGNORE NULLS for your window function and apply the window function only if the current value is not null:

sample data
WITH dataset(id, visit_time, webpage_visited) AS (
   values (1,'2024-03-14 10:00:01','google.com'),
        (1,'2024-03-14 10:00:07',NULL),
        (1,'2024-03-14 10:01:15',NULL),
        (1,'2024-03-14 10:01:10','espn.com'),
        (1,'2024-03-14 10:02:01',NULL)
)

-- query
select id,
       visit_time,
       webpage_visited,
       if(webpage_visited is not null,
           coalesce(lag(webpage_visited, 1) IGNORE NULLS over(partition by id order by visit_time), 'none'))
           as previous_webpage_visited
from dataset
order by visit_time;

Output:

id visit_time webpage_visited previous_webpage_visited
1 2024-03-14 10:00:01 google.com none
1 2024-03-14 10:00:07 NULL NULL
1 2024-03-14 10:01:10 espn.com google.com
1 2024-03-14 10:01:15 NULL NULL
1 2024-03-14 10:02:01 NULL NULL

UPD

To handle empty/whitespace entries you can preprocess them, for example with subquery:

select id,
       visit_time,
       webpage_visited,
       if(webpage_visited_sanitized is not null,
           coalesce(lag(webpage_visited_sanitized, 1) IGNORE NULLS over(partition by id order by visit_time), 'none'))
           as previous_webpage_visited
from (select *,
             if(trim(webpage_visited) = '', null, webpage_visited) webpage_visited_sanitized
      from dataset
     );