Why is builtin sorted() slower for a list containing descending numbers if each number appears twice consecutively?

Question

Why is builtin sorted() slower for a list containing descending numbers if each number appears twice consecutively?

2.8k Views Asked by no comment At 05 March 2024 at 15:21

I sorted four similar lists. List d consistently takes much longer than the others, which all take about the same time:

a:  33.5 ms
b:  33.4 ms
c:  36.4 ms
d: 110.9 ms

Why is that?

Test script (Attempt This Online!):

from timeit import repeat

n = 2_000_000
a = [i // 1 for i in range(n)]  # [0, 1, 2, 3, ..., 1_999_999]
b = [i // 2 for i in range(n)]  # [0, 0, 1, 1, 2, 2, ..., 999_999]
c = a[::-1]                     # [1_999_999, ..., 3, 2, 1, 0]
d = b[::-1]                     # [999_999, ..., 2, 2, 1, 1, 0, 0]

for name in 'abcd':
    lst = globals()[name]
    time = min(repeat(lambda: sorted(lst), number=1))
    print(f'{name}: {time*1e3 :5.1f} ms')

Original Q&A

There are 2 best solutions below

no comment On 05 March 2024 at 16:34

As @jpa's answer explained, a, b and c have a single "run" while d has a million. We can also see the effect of that by counting the comparisons:

a:  1999999 comparisons, 1.000 per element
b:  1999999 comparisons, 1.000 per element
c:  1999999 comparisons, 1.000 per element
d: 10710650 comparisons, 5.355 per element

A single long run means we only compare each pair of neighbors once, that's all. For n elements, that's n-1 comparisons.

Since a and b are ascending, they're already sorted, and nothing further is done. Since c is descending, it simply gets reversed, and then nothing further is done.

For d, the two-element runs first get combined into runs of 32 to 64 elements using binary insertion sort. And then these longer runs get merged again and again into larger and larger runs until the whole list is sorted.

I tried various values of n. The number of comparisons for d hovered around 5 per element, going up to ~5.3 until n reached a power of 2, after which the number of comparisons fell to ~4.7 again. In general, d requires O(n) comparisons. It doesn't need n log n comparisons, because the merging uses "galloping" (an exponential search), which in this case needs only logarithmically many comparisons. So unlike ordinary merge sort in general cases, the recurrence relation for the number of comparisons here isn't T(n) = 2T(n/2) + n but T(n) = 2T(n/2) + log(n), which is O(n).

The runtime however isn't O(n), as there are still n log n moves of elements. But that's low-level and very fast, relatively fast compared to the O(n) slower comparisons. If you measure time for various n, you might very well think it's O(n) time.

The script for counting (Attempt This Online!):

n = 2_000_000
a = [i // 1 for i in range(n)]
b = [i // 2 for i in range(n)]
c = a[::-1]
d = b[::-1]

class Int:
    def __init__(self, value):
        self.value = value
    def __lt__(self, other):
        global comparisons
        comparisons += 1
        return self.value < other.value

for name in 'abcd':
    lst = globals()[name]
    comparisons = 0
    sorted(map(Int, lst))
    print(f'{name}: {comparisons} comparisons, {comparisons/len(lst) :5.3f} per element')

**jpa** · Accepted Answer · 2024-03-05T15:42:30.610000

As alluded to in the comments by btilly and Amadan, this is due to how the Timsort sorting algorithm works. Detailed description of the algorithm is here.

Timsort speeds up operation on partially sorted arrays by identifying runs of sorted elements.

A run is either "ascending", which means non-decreasing:

a0 <= a1 <= a2 <= ...

or "descending", which means strictly decreasing:

a0 > a1 > a2 > ...

Note that a run is always at least 2 long, unless we start at the array's last element.

Your arrays a, b and c each consist of just one run. The array d has 1 million runs.

The reason why the descending run cannot be >= is to make the sort stable, i.e. keep the order of equal elements:

The definition of descending is strict, because the main routine reverses a descending run in-place, transforming a descending run into an ascending run. Reversal is done via the obvious fast "swap elements starting at each end, and converge at the middle" method, and that can violate stability if the slice contains any equal elements. Using a strict definition of descending ensures that a descending run contains distinct elements.

Python 3.11 has slightly improved version of timsort, sometimes called powersort, but it uses the same run detection and thus has the same performance.

Why is builtin sorted() slower for a list containing descending numbers if each number appears twice consecutively?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in ALGORITHM

Related Questions in PERFORMANCE

Related Questions in SORTING

Related Questions in TIME-COMPLEXITY

Trending Questions

Popular # Hahtags

Popular Questions