If you have ever worked on a Computer Vision project, you might know that using a learning rate scheduler might significantly increase your model training performance. On this page, we will:

  • Сover the ReduceLROnPlateau (Reduce Learning Rate on Plateau) scheduler;
  • Check out its parameters;
  • See a potential effect from ReduceLROnPlateau on a learning curve;
  • And check out how to work with ReduceLROnPlateau using Python and the PyTorch framework.

Let’s jump in.

ReduceLROnPlateau is a scheduling technique that decreases the learning rate when the specified metric stops improving for longer than the patience number allows. Thus, the learning rate is kept the same as long as it improves the metric quantity, but the learning rate is reduced when the results run into stagnation.

The ReduceLROnPlateau scheduler is good to use when you are unsure how your model behaves with your data.

Source
  • Mode:
    • Min - the learning rate will be reduced when the monitored metric stops decreasing;
    • Max - the learning rate will be reduced when the monitored metric stops increasing.
  • Factor - a factor by which the learning rate will be reduced when the quantity stops improving. The formula is the following: new_learning_rate = learning_rate * factor.
The factor value should be greater than 0 and less than 1. If the value is greater than 1, then the learning rate will explode. If the factor is 1, the learning rate would never decay.
  • Patience - the number of epochs with no improvement, after which the learning rate is reduced. If the patience is 10, the algorithm ignores the first 10 epochs with no improvement in the quantity and reduces the learning rate in the 11th epoch.
  • Threshold - the minimum value by which the quantity should change to count as an improvement. For example, if the threshold is 0.01 and the monitored quantity changes from 0.03 to 0.025, this is not counted as an improvement.
  • Threshold mode - defines how exactly a dynamic threshold is calculated:
    • Rel - dynamic threshold = best * (1+ threshold) in 'max' mode or best * (1- threshold) in 'min' mode;
    • Abs - dynamic threshold = best + threshold in 'max' mode or best - threshold in 'min' mode.
  • Cooldown - the number of epochs the algorithm would wait after the learning rate reduction before resuming normal operations (monitoring for improvements, calculating patience, and so on.)
  • Min LR - the minimum learning rate for all the parameters. The learning rate would stay at this constant minimum once it reaches it.
  • Eps - the minimum amount of decay applied to the learning rate. If the difference between the previous and current learning rates is less than Eps, then the update is ignored, and the previous learning rate is used.
LR over epochs for different patience values in ReduceLROnPlateau.
Source
Training accuracy over epochs for different patience values in ReduceLROnPlateau.
Source
Training loss over epochs for different patience values in ReduceLROnPlateau.
Source
python
      import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel',
cooldown=0, min_lr=0, eps=1e-08)
for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()
    

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more
Last modified