Python Hashing of "tupled" numpy Array

61 Views Asked by Jon Nir At 14 March 2024 at 11:29

I have a class MyClass where each instance stores pixels' x- and y-coordinates, represented as two 1D numpy arrays (of the same length). Two instances are considered equal if their coordinate arrays are identical (including nan).
I tried two methods of hashing: one by casting both arrays to tuples and hashing those, and the other by calling the tobytes() method for each array:

class MyClass:
  # ... init, doA(), doB(), etc. ...
  def __eq__(self, other):
    if not type(self) == type(other):
      return False
    if not np.array_equal(self._x, other._x, equal_nan=True):
      return False
    if not np.array_equal(self._y, other._y, equal_nan=True):
      return False
    return True

  def hash1(self):
    return hash((tuple(self._x), tuple(self._y)))

  def hash2(self):
    return hash((self._x.tobytes(), self._y.tobytes()))

Calling hash1 on the same instance yields different hashes, and calling hash2 outputs the same thing every time. Why do these behave so differently?

Original Q&A

There are 1 best solutions below

user2357112 On 17 March 2024 at 08:41 BEST ANSWER

A NumPy array doesn't store its elements as Python objects (unless you're using dtype=object). It stores raw hardware numeric values. That means when you call tuple, the array has to create Python objects for all the elements. For example, if your array has dtype float64, the array has to generate instances of numpy.float64.

The array doesn't save these wrapper objects. Every time you call tuple, the array generates new wrapper objects. Two instances of numpy.float64 with NaN values aren't guaranteed to hash the same, so if your array contains NaNs, hashing the tuples isn't guaranteed to produce consistent results.

Python Hashing of "tupled" numpy Array

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in HASH

Trending Questions

Popular # Hahtags

Popular Questions