I'm using PyTorch Lightning's LR Finder but am getting an atypical curve. The loss starts at its lowest point when the learning rate is at its smallest, increases until it plateaus, and then exhibits the typical u-shaped curve. No matter what learning rate I start with (I've tried 1e-12, 1e-6, 1e-3, etc.) this same thing happens where the loss is always smallest at the lowest learning rate:
Example model class: `class simple_model(pl.LightningModule):
def __init__(self, encoder_name, lr, **kwargs):
super().__init__()
self.model = timm.create_model(encoder_name, pretrained=True)
self.loss_fn = nn.BCEWithLogitsLoss()
self.lr = lr
def forward(self, image):
pred = self.model(image)
return pred
def shared_step(self, batch, stage):
image = batch["image"]
labels = batch["label"]
logits = self.forward(image)
loss = self.loss_fn(logits.squeeze(),labels.float())
return {
"loss" : loss,
"pred": logits.sigmoid().round().squeeze().cpu().detach().numpy(),
"true": labels.cpu().detach().numpy()
}
def shared_epoch_end(self, outputs, stage):
loss = [x["loss"].item() for x in outputs]
whole_dataset_loss = np.mean(loss)
self.log_dict({f"{stage}_loss": whole_dataset_loss}, prog_bar=True)
def training_step(self, batch, batch_idx):
return self.shared_step(batch, "train")
def training_epoch_end(self, outputs):
return self.shared_epoch_end(outputs, "train")
def validation_step(self, batch, batch_idx):
return self.shared_step(batch, "valid")
def validation_epoch_end(self, outputs):
return self.shared_epoch_end(outputs, "valid")
def test_step(self, batch, batch_idx):
return self.shared_step(batch, "test")
def test_epoch_end(self, outputs):
return self.shared_epoch_end(outputs, "test")
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), self.lr)
Example use of LR Finder:
trainer = pl.Trainer(accelerator='gpu',devices=1,max_epochs=140, precision=16, log_every_n_steps=1) lr_finder = trainer.tuner.lr_find(model,dataset)
I've seen this behavior across multiple datasets and model types (fully supervised and ssl). Pretty much everything implementation-wise is out-of-the-box from pytorch lightning (1.9.0) and my models train and converge fine, so I'm not quite sure how to approach this. It happens regardless of whether I use ImageNet weights or random weights (all from the timm library)

I'm getting a similar plot to you, and there is a similar question on Stackoverflow: Pytorch Lightning Learning Rate Tuners Giving unexpected results
It may be due to the issue reported here: https://github.com/Lightning-AI/pytorch-lightning/issues/14167
i.e., there may be some moving average smoothing applied, which starts at 0, so the first few loss values are averaged along with 0 leading to the low loss observed on the plot.
However, that doesn't explain why there are many images online of results without this behaviour, unless they were generated with different versions of Lightning. If it is the cause, though, I guess we just have to ensure that the lowest learning rate tested is much too low to be near the optimal and then ignore the left-hand side of the resulting plot.