How do I merge two dictionaries keeping max value against common keys?

2k Views Asked by At

I have two dictionary that look like:

{'r': 2, 'e': 4, 'h': 2, 'k': 4}

and

{'r': 2, 'e': 5, 'y': 2, 'h': 2}

how do I get a dictionary that has all the keys but incase there are keys in both initial dictionaries it keeps the higher value for that key? I want a dictionary that looks like this:

{'e': 5, 'k': 4, 'y': 2, 'h': 2, 'r': 2}

None of the previous answers helped me.

2

There are 2 best solutions below

0
Jab On BEST ANSWER

You can use itertools.chain to combine all values then itertools.groupby to get all the values for each individual key and just take the max of those values. You will need to sort the merged data before using groupby for it to work correctly though. Also I'm using operator.itemgetter to get the keys and values instead of lambdas so you could just replace them with lambdas if you don't want to import another library although I wouldn't advise it as it is slower and no real need to use them really.

from itertools import chain, groupby
from operator import itemgetter

data1 = {'r': 2, 'e': 4, 'h': 2, 'k': 4}
data2 = {'r': 2, 'e': 5, 'y': 2, 'h': 2}

get_key, get_val = itemgetter(0), itemgetter(1)
merged_data = sorted(chain(data1.items(), data2.items()), key=get_key)

output = {k: max(map(get_val, g)) for k, g in groupby(merged_data, key=get_key)}

print(output)

{'e': 5, 'h': 2, 'k': 4, 'r': 2, 'y': 2}

Another alternative here is collections.defaultdict and to ensure you always get the correct output to include if there are negative values use float('-inf') as the default value:

from collections import defaultdict

output = defaultdict(lambda: float('-inf'))

for d in (data1, data2):
    for k, v in d.items():
        output[k] = max(output[k], v)

print(dict(output))

{'r': 2, 'e': 5, 'h': 2, 'k': 4, 'y': 2}

Or without any imports dict.setdefault can basically take the place of defaultdict:

output = {}

for d in (data1, data2):
    for k, v in d.items():
        output.setdefault(k, float('-inf'))
        output[k] = max(output[k], v)
        
print(output)

{'r': 2, 'e': 5, 'h': 2, 'k': 4, 'y': 2}

Lastly, using pandas

import pandas as pd

data1 = {'r': 2, 'e': 4, 'h': 2, 'k': 4}
data2 = {'r': 2, 'e': 5, 'y': 2, 'h': 2}
    
res = pd.concat(map(pd.DataFrame, ([data1], [data2]))).max().astype(int).to_dict()
0
shauli On

you can make a union set of both dict keys, and then use a dict comprehension that takes the maximal values for each key:

keys = set(a.keys()).union(b.keys())
output = {k:max(a.get(k,float('-inf')), b.get(k, float('-inf'))) for k in keys}