Suppose we have a dictionary d defined as follows
d = {i:i for i in range(1,1000001)}
And then we store d.items() in x. So, x is a collection of tuples that contain key-value pairs for each element of d. I made a list with the same tuples, defined as follows :
l = [(i, i) for i in range(1,1000001)]
Now, I compared the time taken to execute (1000001, 1000001) in x and (1000001, 1000001) in l. The time difference was very significant. I assumed that this was because x had a set-like behavior and implemented hash tables on tuples, as tuples are hashable as well.
However, tuples that contain lists are non hashable, so I tried the same thing with this dictionary :
d = {i:[i,i+1] for i in range(1,1000001)}
Unlike what I expected, it still enabled fast lookup.
How does this work?
PS : Here's my code for time comparison
import time
def for_dict_items(n):
#d = {i:i for i in range(1, n+1)}
d = {i:[i, i+1] for i in range(1, n+1)}
di = d.items()
st = time.time()
x = (n, [n, n+1]) in di
et = time.time()
return (et - st)
def for_tuples_list(n):
#l = [(i, i) for i in range(1, n+1)]
l = [(i,[i, i+1]) for i in range(1, n+1)]
st = time.time()
x = (n, [n, n+1]) in l
et = time.time()
return (et - st)
k = 1000000
t1 = for_dict_items(k)
t2 = for_tuples_list(k)
print(t1, t2, t2/t1, sep = "\n")
Since dict is one of the most important core objects, it's implemented in C, as well as
dict_values,dict_keysanddict_items. Here's the CPython implementation ofdictitems_contains:If you don't speak C, here's what it does:
The core of this is the fact that
dict.itemsreturns not a list, but a specialdict_itemsview. The term view here means that it does not represent a copy, but only a proxy to retrieve underlying items in some specific way. The same applies todict_keysanddict_values, you can read more in the excellent documentation page on this topic.Note that this is specific to CPython implementation and may be wrong for other implementations.