Python Dictionary: Checking if items in a list are present in any of the lists in the dictionary

45 Views Asked by At

I have a dictionary containing transactions like so:

transactions = {
   "T1": ["A", "B", "C", "E"],
    "T2": ["A", "D", "E"],
    "T3": ["B", "C", "E"],
    "T4": ["B", "C", "D", "E"],
    "T5": ["B", "D", "E"]
}

I then have an items list as so:

items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]

and what I am trying to figure out, is how I can calculate the number of occurrences these items have in the transactions dictionary. For example in this scenario, I would be returning a dictionary that would look something like:

{('B','C'): 3, ('B', 'D'): 2, ('B', 'E'): 4, ('C', 'D'): 1, ('C', 'E'): 3, ('D', 'E'): 3}

I have the following function:

def get_num_occurrences(items, transactions):
    occurr = dict()
    for x in items:
        occurr[x] = 0
    for transaction in transactions.values():
        for item in transaction:
            occurr[item] += 1
    return occurr

This works for 1 item-itemsets (if the list of items was instead items = ["A", "B", "C", "D", "E"]). But I cannot figure out how to implement this same method for 2 item-itemsets (items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]) or if I then had 3 item-itemsets (items = [('B', 'C', 'D'), ('C', 'D', 'E'), ('A', 'C', 'E')]) etc...

2

There are 2 best solutions below

0
Matthias On BEST ANSWER

Use sets to determine if your item is a subset of the transaction.

transactions = {
   "T1": ["A", "B", "C", "E"],
    "T2": ["A", "D", "E"],
    "T3": ["B", "C", "E"],
    "T4": ["B", "C", "D", "E"],
    "T5": ["B", "D", "E"]
}

items = [('B', 'C'), ('B', 'D'), ('B', 'E'), ('C', 'D'), ('C', 'E'), ('D', 'E')]

result = {}
for item in items:
    count = 0
    for transaction in transactions.values():
        if set(item).issubset(set(transaction)):
            count += 1
    result[item] = count

print(result)

The result is {('B', 'C'): 3, ('B', 'D'): 2, ('B', 'E'): 4, ('C', 'D'): 1, ('C', 'E'): 3, ('D', 'E'): 3}.


With a dictionary comprehension you can write all of this in one line.

result = {item: sum(set(item).issubset(set(t)) for t in transactions.values()) for item in items}
0
heatherfuke On

This way it really doesn't matter how many elements are in your item an you can use an efficient dictionary lookup.

def count_items(item):

    if item in dictionary:
        dictionary[item] += 1
    else:
        dictionary.update({item: 1})

for item in items: count_items(item)