I'd like to sort a list and observe/visualize how Python's sorting algorithm Timsort is moving the elements around.
First attempt: A subclass of list which prints itself after each change:
class List(list):
def __setitem__(self, index, value):
list.__setitem__(self, index, value)
print(self)
That works when I change elements myself, but during sort ... nothing:
>>> a = List([None] * 2)
>>> a[0] = 'S'
['S', None]
>>> a[1] = 'P'
['S', 'P']
>>> a.sort()
>>>
Second attempt: Print the list (in global variable a) at each comparison of elements:
class Str(str):
def __lt__(self, other):
print(a)
return other > self
That does something, but the list is always... empty?
>>> a = list(map(Str, 'test'))
>>> a.sort()
[]
[]
[]
[]
[]
[]
>>>
Why do these attempts fail, and is there a way to observe what Timsort is doing?
Yes, it can be observed. I did, here's a visualization:
And another one with more elements:
Why do your attempts fail?
listsubclass fails becausesortworks "inside" the list at C code level, not going through the public Python__setitem__interface.But then how can we observe the list during the sorting, if it's empty? Easy: We don't. Not during the sorting. But after partial sortings.
Timsort is:
So the idea is:
Code doing that (except it doesn't show duplicate states, which happens when Timsort makes several comparisons before the next move):
Output, discussed below:
In the above output, note there are three stages:
1356>ABCDGIJOQRSTVWZabcfijknrsvz. Timsort does that with binary insertion sort. Each line in the output corresponds to one insertion.024789<EFHKLMNPUXYdeghlmopqtuwxy. Again with binary insertion sort.1356>ABCDGIJOQRSTVWZabcfijknrsvz024789<EFHKLMNPUXYdeghlmopqtuwxy01356>ABCDGIJOQRSTVWZabcfijknrsvz24789<EFHKLMNPUXYdeghlmopqtuwxyThe way Timsort really merges the two halves is to move the first half out, to temporary space. And then merge the two halves into the given list from left to right. So really after the
0from the second half gets moved to the front, the list looks like this:0--------------------------------24789<EFHKLMNPUXYdeghlmopqtuwxyWith all the dashes being unoccupied space. Now when I raise my exception, Timsort doesn't just leave the list like that, but nicely at least moves the first halve's remaining elements back into that unoccupied space. And that's why in my output, it looks like Timsort is moving the
0to the left and shifting the entire first half to the right by one index. That would be inefficient, and it's not what happens when Timsort runs normally, without my exceptions.You can also see these three stages in my animated image at the top. And in this "screensaver" version, where I scroll down through the above output. I think it looks cool (was hoping it would look somewhat like Matrix code rain) but I find it less clear:
Note that here the third stage merges from right to left (with the right half moved out to temp space), which Timsort does when that's better.
Code for generating that image using Pillow, after storing the states in list
statesinstead of printing them: