How to concatenate N rows of a column based on a specific column value using a condition in Google BigQuery?

47 Views Asked by At
TimeStamp USER ID string_col
1 100001 Here
2 100001 there
3 100001 Apple
4 200002 this is
5 200002 that is
6 200002 Apple
7 200002 some
8 200002 summer
9 200002 winter
10 200002 Apple

That is my raw table & I want to use the word "Apple" as a trigger word to concatenate ONLY N (For this example 2) rows above that for every userID.

Below table is the output I'm looking for:

TimeStamp USER ID string_col Result
1 100001 Here null
2 100001 there null
3 100001 Apple Here There
4 200002 this is null
5 200002 that is null
6 200002 Apple this is that is
7 200002 some null
8 200002 summer null
9 200002 winter null
10 200002 Apple summer winter
2

There are 2 best solutions below

0
blackraven On BEST ANSWER

For the passion of problem solving and coding, I would like to share this simple solution to see if it helps to answer your question:

SELECT
  t1."TimeStamp",
  t1."USER ID",
  t1."string_col",
  (
    SELECT
      STRING_AGG(t2."string_col", ' ')
    FROM
      raw_table AS t2
    WHERE
      t2."USER ID" = t1."USER ID"
      AND t2."TimeStamp" < t1."TimeStamp"
      AND t1."string_col" = 'Apple'
      AND t2."string_col" != 'Apple'
      AND t1."TimeStamp" - t2."TimeStamp" <= 2
  ) AS Result
FROM
  raw_table AS t1

The output looks like this:
enter image description here

3
Mikhail Berlyant On

Use below

SELECT * EXCEPT(grp),
  CASE 
    WHEN string_col = 'Apple'
    THEN STRING_AGG(string_col, ' ') OVER win
  END AS Result
FROM (
  SELECT *, 
    COUNTIF(string_col = 'Apple') OVER win AS grp
  FROM your_table
  WINDOW win AS (PARTITION BY user_id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WINDOW win AS (PARTITION BY user_id, grp ORDER BY ts ROWS BETWEEN 2 PRECEDING AND 1 PRECEDING)     

with output

enter image description here