Can Low gradient norm be an indicator of a problem in my deep learning model?

66 Views Asked by At

Anyone knows how to troubleshoot deep learning model training through gradient norm ? I am reproducing a research paper work but I'm not getting same results as theirs. I am training a model of 16 residual blocks with ReLU activation function, categorical cross entropy, StepLRScheduler and Adam optimizer. number of epochs = 10, batch size = 12, starting lr = 0.001 and then it decays by 0.5 every epoch after the 6th epoch. the dataset is imbalanced so I tried weighted loss and focal loss and nothing seems to work. I tried to troubleshoot the training by computing the gradient norm every 1000 batch and noticed the values are small (around 1e-4) is this normal or can this be an indicator of some problem ?

Total Gradient Norm

I looked for references that may help in interpreting gradient norm and how to use it to troubleshoot my model but I found nothing. My second question is : Does anyone have interesting ressources on how to troubleshoot and diagnose deep learning models ?

0

There are 0 best solutions below