ReduceLROnPlateau is a scheduling technique that monitors a quantity and decays the learning rate when the quantity stops improving.
The improvement of the quantity is based on whether it increases or decreases by a certain minimum amount. This minimum amount is the threshold.
The user is able to define one of the two modes : min and max.
If max is chosen, then the learning rate is decayed once the monitored quantity stops increasing by a certain minimum threshold.
If min is chosen, then the learning rate is decayed once the monitored quantity stops decreasing by a certain minimum threshold.
It is the factor by which the learning rate is decreased when the quantity stops improving.
The factor value should be greater than 0 and less than 1. If the value is greater than 1, then the learning rate will explode. If the factor is 1, then it would never decay the learning rate.
It is number of epochs with no improvement after which the learning rate is reduced. If the patience is 10, then it ignores the first 10 epochs with no improvement in the quantity and reduces the learning rate in the 11th epoch.
It is the minimum value by which the quantity should change in order to count as an "improvement". For example, if threshold is 0.001 and the monitored quantity changes from 0.003 to 0.0025, then this is not counted as improvement.
The user is able to choose rel or abs for the threshold mode.
It essentially defines the way in which a dynamic threshold is calculated.
Mathematically in rel mode:
dynamic threshold = best * (1+ threshold) in 'max' mode or best * (1- threshold) in 'min' mode.
In abs mode:
dynamic threshold = best + threshold in 'max' mode or best - threshold in 'min' mode.
It is the number of epochs that the tool would wait after the reduction of the learning rate before resuming the normal operations.
It is the minimum learning rate for all the parameters. The learning rate would be this constant minimum once it reaches it.
It is the minimum amount of decay set for the learning rate. difference between the previous learning rate and the current learning rate is less than Eps, then this decay is ignored and the previous learning rate is used.
import torch model = [Parameter(torch.randn(2, 2, requires_grad=True))] optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False) scheduler=torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08) for epoch in range(20): for input, target in dataset: optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step() scheduler.step()