Basic Conv2d has different results with CUDA and CPU

69 Views Asked by At

I tested a basic Conv2d operation but it has different results with CUDA and CPU.

When running with the CPU, the values are exactly 2.5600, 3.8400, 3.8400, ..., 3.8400, 3.8400, 2.5600. However, when running with CUDA, the values are 2.5587, 3.8381, 3.8381, ..., 3.8381, 3.8381, 2.5587.

And when I converted the dtype to FP64, it is fixed. However default data type for Torch is known to be FP32. In that case, is this issue unavoidable?

The Code is below:

# %%
conv_test = nn.Conv2d(64, 128, 3, padding=(1,1),bias=False,dtype=torch.float64)
conv_test.weight=torch.nn.Parameter (conv_test.weight*0+0.1)
conv_test.to(torch.device("cpu"))
conv_test.to(torch.device("cuda:0"))
# %%
dummy=torch.randn(10,64,8,8).to(torch.float64)
dummy=dummy*0+0.1
dummy=dummy.to(torch.device("cpu"))
dummy=dummy.to(torch.device("cuda:0"))

# %%
conv_test(dummy)
1

There are 1 best solutions below

0
Yakov Dan On

It's unavoidable due to the way floating point numbers are implemented. For one, floating point arithmetic is not associative. You can verify it for yourself using something like this:

print((0.7 + (0.2 + 0.1)) == 1) # True
print(((0.7 + 0.2) + 0.1) == 1) # False

Although arithmetically the result should be the same, due to the way floating point arithmetic works, the order of summation matters.

In general, the same operator will be implemented differently on different types of hardware in order to make best use of that hardware's features. So, the actual order of arithmetic operations performed to compute conv2d will not be the same.

Lack of associativity is only one of the ways in which FP arithmetic can perform unexpectedly and may not even be the root cause of the actual difference you observe. But this is enough to understand that there's no reason to expect the outputs to match exactly.

You can read more about the way floating point values work here