How exactly is the __lt__ method invoked here?

162 Views Asked by At

I was trying to solve Top K Frequent Words, specifically in O(n Log k) time. Apparently the solution is:

from collections import Counter
from heapq import heappush, heappop


class Pair:
    def __init__(self, word, freq):
        self.word = word
        self.freq = freq

    def __lt__(self, p):
        return self.freq < p.freq or (self.freq == p.freq and self.word > p.word)


class Solution:
    def topKFrequent(self, words: List[str], k: int) -> List[str]:
        cnt = Counter(words)
        h = []
        for word, freq in cnt.items():
            heappush(h, Pair(word, freq))
            if len(h) > k:
                heappop(h)
        return [p.word for p in sorted(h, reverse=True)]

I understand most of what's going on here. The only thing that I can't figure out is the __lt__ method. I understand that we are overriding the __lt__ method, but how is it being implemented here? How is it being invoked?

My best guess is most of the invocation is done "under the cover". That is, when new Pair objects are pushed onto the heap, __lt__ will be invoked to compare frequencies and since it returns a boolean, depending on whether the return value is True or False, the heap will adjust accordingly?

But then, is __lt__ being invoked again in the last return statement when we use sorted()?

The problem requires that we return the top K words with the highest frequencies and if there is a tie (words have the same frequency), then sort those words lexicographically. I'm trying to understand where the sorting is exactly done? It seems like both the frequency sorting and lexicographically sorting is done within the __lt__ method?

I walked through the code and used the debugger to see the order of execution of the code. I can't seem to figure out how/when exactly the magic method (__lt__) comes into play and when the lexicographical sorting is take care of.

I've found some previous threads that discuss this very problem but could not find an explanation/answer that sheds some light on the __lt__ method.

2

There are 2 best solutions below

0
SimonUnderwood On

lt is lowercase LT and not It. lt stands for "less than". the __lt__ method is invoked whenever a < comparison is made. For example, x < y will call x.__lt__(y) under the hood. I am not totally sure, but my guess is that the sorted() function uses __lt__ to compare and order items in a list.

0
akhildevelops On

__lt__ is a dunder (magic method) that is invoked for < symbol in python expression.

heapq internally uses < symbol to re-adjust the list everytime an element is added to the heap. You can find it here in the source repo of python of heapq module

I can't comment on how sorted internally (i.e, does it use __lt__) but I presume it uses __lt__ otherwise we won't get the sorted list of the heap.