Finding the most commonly associated items for every item within a list

123 Views Asked by At

I'm really struggling to find an answer to this - I'm trying to find the most commonly sold together items for each item in a list. I've managed to get my data looking something like this:

order_number    item_name
0   517640  [nan]
1   517660  [a]
2   517663  [a, b]
3   517665  [a, c, d, e]
4   517666  [c, a, b, d]

The code I'm currently using is:

import itertools
list(itertools.combinations(items.item_name[0], 3))
combinations_list = []

for row in items.item_name:
    combinations = list(itertools.combinations(row, 2))
    combinations_list.append(combinations)
combinations_list[:2]

combination_counts = pd.Series(combinations_list).explode().reset_index(drop=True)

combination_counts.value_counts()[:50]

This returns the 50 most common pairs that appear in this list (I think ).

Any ideas how I can get the list to show the 3 most commonly sold items alongside every item we currently sell?

Any help would be greatly appreciated.

Cheers

1

There are 1 best solutions below

1
AudioBubble On

For every item, you need to keep the list of counts of the other items appearing in the same order. Scan the list of orders and increment the counters for every pair. In the end, just report the three items with the largest count in each list.

For implementation, if the lists are sparse (a given item appears with few other ones), use a dictionary per item. Otherwise, used an array of counts.