numpy.intersect1d does not work on dictionary.keys()

57 Views Asked by At

I tried to use numpy's intersect1d to compare the keys in two dictionaries. However, this always returns an intersection of zero, for some reason related to dictionary keys being objects. I want to know why this behavior is desireable in any way.

d1 = {'a':1, 'b':2}
d2 = {'b':2, 'c':3}
np.intersect1d(d1.keys(), d2.keys())
> array([], dtype=object)

However,

np.intersect1d(list(d1.keys()), list(d2.keys()))
> array(['b'], dtype='<U1')

Is this intended behavior and if so, why?

2

There are 2 best solutions below

2
mozway On BEST ANSWER

Dictionary keys are special objects (set-like) that are dynamic views on the dictionary's keys (see doc). They do behave "unexpectedly" (for instance you can't slice them like a list: d1.keys()[0])

Now I'm not sure why (see below) np.intersect1d is not working as expected on dict.keys(), but why use numpy here anyway? This function is defined to work on arrays, not on any object.

Furthermore, since the objects would need to be converted to arrays, this is slower than pure python. Better use a simple set intersection set(d1) & set(d2), or even better (as suggested by @nocomment): d1.keys() & d2.keys():

d1 = dict.fromkeys(range(1, 1_000_000))
d2 = dict.fromkeys(range(1, 2_000_000, 2))

%timeit np.intersect1d(list(d1), list(d2))
# 433 ms ± 97.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit set(d1) & set(d2)
# 162 ms ± 19.8 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

% timeit d1.keys() & d2.keys()
# 60.8 ms ± 12.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

but why?

intersect1d takes two array_like as input, which are defined as any scalar or sequence than can be converted to array (doc). However np.array(d1.keys()) creates an object array containing the keys object (a single object) and not the keys as items:

np.array(d1.keys())
# array(dict_keys(['a', 'b']), dtype=object)

np.array(d1.keys()).size
# 1

A perhaps interesting demonstration is to see the effect of a self intersection, yielding this unique object:

np.intersect1d(d1.keys(), d1.keys())
# array([dict_keys(['a', 'b'])], dtype=object)
4
AGN Gazer On

Arguments of intersect1d() are

ar1, ar2array_like
    Input arrays. Will be flattened if not already 1D.

see https://numpy.org/doc/stable/reference/generated/numpy.intersect1d.html.

Return values of dict.keys() are dictionary view objects , see https://docs.python.org/3/library/stdtypes.html#dict-views. Hence, numpy converts these view objects into arrays of objects as they are not collections of numbers:

>>> np.array({'a':1, 'b':2}.keys())
array(dict_keys(['a', 'b']), dtype=object)

And now intersect1d() will fail as these input arrays each have just one element: dict_keys(['a', 'b']) in the first array and dict_keys(['b', 'c']) in the second array. These are different objects and hence their intersection is a void set.