tensorflow/numpy computations results depend on the processor

78 Views Asked by At

Edit (07/11/2023)

After the various remarks in the comments, we understood part of the discrepancy between the various results we obtained. The fact that the results of computations on GPU or CPU might differ is convincingly explained in the reference provided by user ken in its comment. Nevertheless, some other discrepancies when the computations are performed on different CPUs did not find an explanation there. After further research it became obvious that the problem does not come from Tensorflow only, but also from Numpy.

In fact, in our first tests, we were using tensorflow's random generator, but observed that the result of these generators already were slightly different, depending on the computer we used.

Since we were focused on operations rather than generation of numbers, we relied on numpy's generator, thinking that they did not depend on the processor used.

Now we are convinced that in fact

  • tensorflow as well as numpy furnish the same result to the same operation when the operands are exactly the same and the computations are performed on various CPU's but
  • both numpy and tensorflow random generators furnish values that depend on the CPU.

Here is an example :

np.random.seed(263)
print(np.random.randn(1))

On I5-1135G7 and I5-5357U, we get :

[1.5188843533672705]

On Google Colab's Xeon and I7-8700 we get

[1.5188843533672707]

(for all other seeds from 0 to 262 the results are exactly the same on both systems)

So the new question might be :

Why do numpy and tensorflow random generators behave differently depending on the computer we use ?

end of edit

The exact same computation on different computers doesn't necessarily give the same result.

Here is a minimal and reproducible example :

import tensorflow as tf
import numpy as np

np.random.seed(0)

numpy_numbers        = np.random.randn(100000)
tensorflow_numbers   = tf.convert_to_tensor(numpy_numbers)

numpy_sum      = np.sum(numpy_numbers)
tensorflow_sum = tf.reduce_sum(tensorflow_numbers)

print("hash of numpy_numbers      : ",np.sum([hash(x) for x in numpy_numbers]))
print("hash of tensorflow_numbers : ",np.sum([hash(x.numpy()) for x in tensorflow_numbers]))

print("Numpy sum      : ",tf.convert_to_tensor(numpy_sum))
print("Tensorflow sum : ",tensorflow_sum)

We tried this code on various computers with

  • the same version of Python (3.10.12)
  • the same version of Tensorflow (2.12.0)
  • the same version of Numpy (1.23.5)

and observed that

  • on a given computer the result is always the same
  • on different computers the result differs

For example, on Google Colab,

  • with the choice of runtime type "CPU" (Intel Xeon 2.20GHz) we get
hash of numpy_numbers      :  -2611790474163640565
hash of tensorflow_numbers :  -2611790474163640565

Numpy sum      :  tf.Tensor(157.670050812534, shape=(), dtype=float64)
Tensorflow sum :  tf.Tensor(157.67005081253396, shape=(), dtype=float64)
  • with the choice of runtime type "GPU" (Tesla T4) we get
hash of numpy_numbers      :  -2611790474163640565
hash of tensorflow_numbers :  -2611790474163640565

Numpy sum      :  tf.Tensor(157.670050812534, shape=(), dtype=float64)
Tensorflow sum :  tf.Tensor(157.67005081253393, shape=(), dtype=float64)

Question :

What is the explanation for this phenomenon ?

Our guess :

We came up with the following guess :

Tensorflow, when given a big sum to compute, first fills the cache with the first numbers to be summed, and compute a first partial sum, then does the same with the following numbers until all the numbers have been used, and finally adds the partial sums.

So there would be a lack of associativity that would depend on the size of the cache.

But this is only a guess, and we would like to have a firm understanding of what really happens.

0

There are 0 best solutions below