I'm trying to run the quantization-aware-training (Eager Mode Static Quantization) on CUDA device in pytorch.
Iam facing the below error:
RuntimeError: expected scalar type Float but found Half.
The quantization-aware-training is working fine on CPU device. But while running on the GPU, it takes the input device type as CUDA, runs the model training using torch.cuda.amp.autocast() and torch.cuda.amp.GradScaler(enabled=True). When running the training in this setting I'm facing the above-mentioned error.
I have tried the following based on suggestions from https://github.com/NVIDIA/apex/issues/965,
- Convert all the model parameters to float32
- Replace x = conv(x) to x=conv(x.float()) But none of it seems to work to resolve the error.
I also tried disabling AMP by setting with torch.cuda.amp.autocast(False):
This gets around the previous issue but ends up with a different RuntimeError: Unsupported qscheme: per_channel_affine.
Any pointers here would be of great help!