Basic Conv2d has different results with CUDA and CPU

69 Views Asked by electronicbrain At 22 January 2024 at 11:48

I tested a basic Conv2d operation but it has different results with CUDA and CPU.

When running with the CPU, the values are exactly 2.5600, 3.8400, 3.8400, ..., 3.8400, 3.8400, 2.5600. However, when running with CUDA, the values are 2.5587, 3.8381, 3.8381, ..., 3.8381, 3.8381, 2.5587.

And when I converted the dtype to FP64, it is fixed. However default data type for Torch is known to be FP32. In that case, is this issue unavoidable?

The Code is below:

# %%
conv_test = nn.Conv2d(64, 128, 3, padding=(1,1),bias=False,dtype=torch.float64)
conv_test.weight=torch.nn.Parameter (conv_test.weight*0+0.1)
conv_test.to(torch.device("cpu"))
conv_test.to(torch.device("cuda:0"))
# %%
dummy=torch.randn(10,64,8,8).to(torch.float64)
dummy=dummy*0+0.1
dummy=dummy.to(torch.device("cpu"))
dummy=dummy.to(torch.device("cuda:0"))

# %%
conv_test(dummy)

Original Q&A

There are 1 best solutions below

Yakov Dan On 22 January 2024 at 14:09

It's unavoidable due to the way floating point numbers are implemented. For one, floating point arithmetic is not associative. You can verify it for yourself using something like this:

print((0.7 + (0.2 + 0.1)) == 1) # True
print(((0.7 + 0.2) + 0.1) == 1) # False

Although arithmetically the result should be the same, due to the way floating point arithmetic works, the order of summation matters.

In general, the same operator will be implemented differently on different types of hardware in order to make best use of that hardware's features. So, the actual order of arithmetic operations performed to compute conv2d will not be the same.

Lack of associativity is only one of the ways in which FP arithmetic can perform unexpectedly and may not even be the root cause of the actual difference you observe. But this is enough to understand that there's no reason to expect the outputs to match exactly.

You can read more about the way floating point values work here

Basic Conv2d has different results with CUDA and CPU

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in CUDA

Related Questions in DTYPE

Trending Questions

Popular # Hahtags

Popular Questions