To find the difference between two sets there is the - operator and .difference(). I'm using this code to time each of those:
import timeit
print(timeit.timeit('''
a.difference({b})
''', setup='''
a = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
b = 3
'''))
# => 0.2423834060318768
print(timeit.timeit('''
a - {b}
''', setup='''
a = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
b = 3
'''))
# => 0.2027170000365004
When I run this in CPython I get this:
0.24530324200168252
0.205820870003663
This made sense to me because .difference() can take any iterable, not just sets. However, when I run it in PyPy I get this:
0.14613953093066812
0.23659668595064431
The times are completely flipped, so surely it can't be because .difference() can take any interable. What is the difference between the implementations of .difference() and -? Is there any difference between CPython and PyPy's implementations?
In PyPy, there is an optimization that is invoked by
difference()but not by__sub__(): when you usea.difference(b)andbis a smaller set thana, then it copiesacompletely and removes the items fromb. If it's not the case, it starts from an empty set and adds the items ofathat are not inb. For some reason__sub__()doesn't go through the path that selects between these two implementations, and always picks the second logic.Please report it on https://foss.heptapod.net/pypy/pypy/-/issues and it will likely be fixed.