How to create a key:value pair or 2 sets of lists returning item ids and corresponding number of purchases in sql (or snowflake to be more specific)?

Question

How to create a key:value pair or 2 sets of lists returning item ids and corresponding number of purchases in sql (or snowflake to be more specific)?

428 Views Asked by Salih K At 11 March 2023 at 06:23

I have an enormous clickstream data which I can pull the timestamp, user_ids corresponding item_ids that customers clicked.

I am trying to create a dataset that will have the unique customer ids (user_id) on each row and another column that has key:value pair that provides me the list of items and number of times corresponding item is clicked like a python dictionary (or this can be in two columns with items and corresponding num. clicks as a sorted list from most frequent to least). The number of clicks for an item is the times that item_id occur for a fixed user_id. Basically, every occurrence of item is one click. If it occurs 10 times for a fixed user_id, that means that click_count = 10 for that user_id, item_id pair.

Here is an example:

USER_ID	CLICKED_ITEMS
user1	item3: 20, item2: 10, item1: 5
user2	item1: 10, item2: 1, item3: 1
user3	item2: 1
user4	item1: 10, item2: 2

or

USER_ID	CLICKED_ITEMS	NUM_CLICKS
user1	item3, item2, item1	20, 10, 5
user2	item1, item2, item3	10, 1, 1
user3	item2	1
user4	item1, item2	0, 2

Is this possible using SQL (esp. for Snowflake syntax)? TIA

I tried

SELECT user_id, using ARRAY_UNIQUE_AGG(item_id) AS clicked_items
FROM clickstream_table
GROUP_BY user_id

and got unique user_ids and corresponding list of items but not sure how to find number of times each item is clicked. I can do this easily using pandas but unfortunately I cannot even pull 1 month of data due to size and timeout limitations (3hr limit) on warehouse.

Original Q&A

There are 1 best solutions below

**Simeon Pilgrim** · Answer 1 · 2023-03-11T08:24:53.477000

Assuming you know how to aggregate your data to be like the following:

with data (user_id, item_name, click_count) as (
    select * from values
        ('user1', 'item1', 5),
        ('user1', 'item2', 10),
        ('user1', 'item3', 20),
        
        ('user2', 'item1', 10),
        ('user2', 'item2', 1),
        ('user2', 'item3', 1),

        ('user3', 'item2', 1),
        
        ('user4', 'item2', 10),
        ('user4', 'item2', 2)
)

the results you are after can be done with ARRAY_AGG and OBJECT_CONSTRUCT

select 
     user_id, 
     array_agg(object_construct(item_name, click_count)) within group (order by click_count desc) as clicked_items   
from data
group by 1
order by 1;

gives:

USER_ID	CLICKED_ITEMS
user1	[ { "item3": 20 }, { "item2": 10 }, { "item1": 5 } ]
user2	[ { "item1": 10 }, { "item2": 1 }, { "item3": 1 } ]
user3	[ { "item2": 1 } ]
user4	[ { "item2": 10 }, { "item2": 2 } ]

or if you want it to look "exactly" like you have shown, then the second option can be done with two LISTAGG

select 
     user_id, 
     listagg(item_name, ', ') within group (order by click_count desc) as clicked_items,
     listagg(click_count, ', ') within group (order by click_count desc) as num_clicks 
from data
group by 1
order by 1;

giving:

USER_ID	CLICKED_ITEMS	NUM_CLICKS
user1	item3, item2, item1	20, 10, 5
user2	item1, item2, item3	10, 1, 1
user3	item2	1
user4	item2, item2	10, 2

well, the first output can be created with string concatenations also.

select 
     user_id, 
     listagg(item_name||': '||click_count, ', ') within group (order by click_count desc) as clicked_items   
from data
group by 1
order by 1;

gives:

USER_ID	CLICKED_ITEMS
user1	item3: 20, item2: 10, item1: 5
user2	item1: 10, item2: 1, item3: 1
user3	item2: 1
user4	item2: 10, item2: 2

How to create a key:value pair or 2 sets of lists returning item ids and corresponding number of purchases in sql (or snowflake to be more specific)?

There are 1 best solutions below

Related Questions in SQL

Related Questions in SNOWFLAKE-CLOUD-DATA-PLATFORM

Related Questions in BIGDATA

Related Questions in KEY-VALUE

Related Questions in CLICKSTREAM

Trending Questions

Popular # Hahtags

Popular Questions