Remove redundant tuples from dictionary based on the score

55 Views Asked by At

I wonder if there is a fast way to remove redundant tuples from dictionary. Suppose I have a dictionary as below:

a = {
    'trans': [('pickup', 1.0), ('boat', 1.0), ('plane', 1.0), ('walking', 1.0), ('foot', 1.0), ('train', 0.7455259731472191), ('trailer', 0.7227749512667475), ('car', 0.7759192750865143)],

    'actor': {
    'autori': [('smug', 1.0), ('pol', 1.0), ('traff', 1.0), ('local authori', 0.6894454471465952), ('driv', 0.6121365092485745), ('car', 0.6297345748705596)],

    'fam': [('fa', 1.0), ('mo', 1.0), ('bro', 1.0), ('son', 0.9925431812951816), ('sis', 0.9789254869156859), ('fami', 0.8392597243422916)],

    'fri': [('fri', 1.0), ('compats', 1.0), ('mo', 0.814126196299157), ('neighbor', 0.7433986938516075), ('parent', 0.32202418215134565), ('bro', 0.8496284151715676),  ('fami', 0.6375584385858655), ('best fri', 0.807654599975373)]
            }
    }

In this dictionary for example we have tuples like: ('car', 0.7759192750865143) for key 'trans' and ('car', 0.6297345748705596) for key 'autori'. I want to remove the tuple ('car', 0.6297345748705596) because it has a lower score.

My desired output is:

new_a = {
    'trans': [('pickup', 1.0), ('boat', 1.0), ('plane', 1.0), ('walking', 1.0), ('foot', 1.0), ('train', 0.7455259731472191), ('trailer', 0.7227749512667475), ('car', 0.7759192750865143)],

    'actor': {
    'autori': [('smug', 1.0), ('pol', 1.0), ('traff', 1.0), ('local authori', 0.6894454471465952), ('driv', 0.6121365092485745)],

    'fam': [('fa', 1.0), ('mo', 1.0), ('bro', 1.0), ('son', 0.9925431812951816), ('sis', 0.9789254869156859), ('fami', 0.8392597243422916)],

    'fri': [('fri', 1.0), ('compats', 1.0), ('neighbor', 0.7433986938516075), ('parent', 0.32202418215134565), ('best fri', 0.807654599975373)]
            }
    }

Is there a fast way to do this or we still need to loop through all values for each key?

2

There are 2 best solutions below

2
Dorian Turba On

To remove lower values, you need to detect duplicate, compare, keep track of the higher value, and remove value if a bigger one is found. The algorithm you want is at least time O(n) and space O(n).

0
Driftr95 On

Not sure it's the most efficient, but since you also mentioned "a simple solution" in a comment....

I think the simplest method would involve looping through every tuple twice: once to collect best scores, and then again to filter everything else. Something like new_a = onlyBest( a, bestRef=dict(sorted(getAllPairs(a))) ) [see function definitions below].

def getAllPairs(obj):
    if isinstance(obj, tuple) and len(obj)==2: return [obj]
    allPairs = []
    if isinstance(obj, dict): obj = obj.values()
    if hasattr(obj, '__iter__') and not isinstance(obj, str):
        for i in obj: allPairs += getAllPairs(i)
    return allPairs

def onlyBest(obj, bestRef:dict):
    if isinstance(obj, list):
      # if all(isinstance(i, tuple) and len(i)==2 for i in obj):
        return [i for i in obj if not i[1] < bestRef.get(i[0],i[1])]
    if isinstance(obj, dict):
        return {k: onlyBest(v,bestRef) for k, v in obj.items()}
    return obj