Unusual Learning Rate Finder Curve: Loss Lowest at Smallest Learning Rate

149 Views Asked by keving At 11 September 2023 at 14:42

I'm using PyTorch Lightning's LR Finder but am getting an atypical curve. The loss starts at its lowest point when the learning rate is at its smallest, increases until it plateaus, and then exhibits the typical u-shaped curve. No matter what learning rate I start with (I've tried 1e-12, 1e-6, 1e-3, etc.) this same thing happens where the loss is always smallest at the lowest learning rate:

Example model class: `class simple_model(pl.LightningModule):

def __init__(self, encoder_name, lr, **kwargs):
    super().__init__()
    self.model = timm.create_model(encoder_name, pretrained=True)
    self.loss_fn = nn.BCEWithLogitsLoss()
    self.lr = lr

def forward(self, image):

    pred = self.model(image)
    return pred

def shared_step(self, batch, stage):
    
    image = batch["image"]
    labels = batch["label"]

    logits = self.forward(image)
    loss = self.loss_fn(logits.squeeze(),labels.float())

    return {
        "loss" : loss,
        "pred": logits.sigmoid().round().squeeze().cpu().detach().numpy(),
        "true": labels.cpu().detach().numpy()
    }

def shared_epoch_end(self, outputs, stage):

    loss = [x["loss"].item() for x in outputs]
    whole_dataset_loss = np.mean(loss)
    self.log_dict({f"{stage}_loss": whole_dataset_loss}, prog_bar=True)

def training_step(self, batch, batch_idx):
    return self.shared_step(batch, "train")            

def training_epoch_end(self, outputs):
    return self.shared_epoch_end(outputs, "train")

def validation_step(self, batch, batch_idx):
    return self.shared_step(batch, "valid")

def validation_epoch_end(self, outputs):
    return self.shared_epoch_end(outputs, "valid")

def test_step(self, batch, batch_idx):
    return self.shared_step(batch, "test")  

def test_epoch_end(self, outputs):
    return self.shared_epoch_end(outputs, "test")

def configure_optimizers(self):
    return torch.optim.Adam(self.parameters(), self.lr)

Example use of LR Finder: trainer = pl.Trainer(accelerator='gpu',devices=1,max_epochs=140, precision=16, log_every_n_steps=1) lr_finder = trainer.tuner.lr_find(model,dataset)

I've seen this behavior across multiple datasets and model types (fully supervised and ssl). Pretty much everything implementation-wise is out-of-the-box from pytorch lightning (1.9.0) and my models train and converge fine, so I'm not quite sure how to approach this. It happens regardless of whether I use ImageNet weights or random weights (all from the timm library)

Original Q&A

There are 1 best solutions below

Ricky On 15 February 2024 at 11:16

I'm getting a similar plot to you, and there is a similar question on Stackoverflow: Pytorch Lightning Learning Rate Tuners Giving unexpected results

It may be due to the issue reported here: https://github.com/Lightning-AI/pytorch-lightning/issues/14167

i.e., there may be some moving average smoothing applied, which starts at 0, so the first few loss values are averaged along with 0 leading to the low loss observed on the plot.

However, that doesn't explain why there are many images online of results without this behaviour, unless they were generated with different versions of Lightning. If it is the cause, though, I guess we just have to ensure that the lowest learning rate tested is much too low to be near the optimal and then ignore the left-hand side of the resulting plot.

Unusual Learning Rate Finder Curve: Loss Lowest at Smallest Learning Rate

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in OPTIMIZATION

Related Questions in HYPERPARAMETERS

Related Questions in PYTORCH-LIGHTNING

Related Questions in LEARNING-RATE

Trending Questions

Popular # Hahtags

Popular Questions