I want to ask that when calculating MSE loss about time-sequence data shaped like (minibatch, feature, sequence length) in pytorch by using nn.MSELoss() with reduction="mean", average just targets on minibatch? or also implicitly about time sequence?
To confirm the calculation result to check what I have questioned above, I printed out below code
nn.MSELoss() #reduction='mean' is default
x_t = torch.ones((32, 4, 100)) # [minibatch size, feature size, time sequence length]
x_est = torch.ones((32, 4, 100)) * 2
loss_result = loss(x_t, x_est)
print(loss_result)
>>> tensor(1.)
The
nn.MSELossloss computes the L2 distance between the two inputs point-wise. Additionally, the reduction parameter dictates whether to average or sum the resulting tensor, in that case, it will apply the reduction over all dimensions. Here is a comparison:With
nn.MSELoss:With builtin Tensor methods: