Cosine AnnealingLR is a scheduling technique that starts off with a very large learning rate and then aggressively decreases it to a value near 0, before again increasing the learning rate.

This variation of the learning rate happens according to the cosine annealing schedule. Mathematically, if the learning rate is given as,

\etat=\eta{min}+\frac{1}{2}(\eta{max}-\eta{min}) \left (1+cos \left( \frac{T{cur}}{T{max}} \pi \right) \right)

Where €€\etat€€ is the current learning rate, €€\eta{max}€€ and €€\eta{min}€€ are the maximum and the minimum learning rates respectively, €€T{cur}€€ is the current number of accumulated epochs.

From the above equation, we can see that once €€T{cur}=T{max}€€, the learning rate becomes €€\eta_{min}€€.

Major Parameters

T Max

T Max is the maximum number of iterations that is used in the aforementioned function.

Learning rate with different T max. Source:

Note that the €€T0€€ in the given figure is €€T{max}€€.

Eta Min

It is the minimum achievable learning rate with the given cosine annealing schedule. In the given figure, we can see that the learning rate goes to 0 and then increases aggressively. Hence, the eta min is 0.

Code Implementation

import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler=torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
for epoch in range(20):
    for input, target in dataset:
        output = model(input)
        loss = loss_fn(output, target)

Get AI confident. Start using Hasty today.

Our platform is completely free to try. Sign up today to start your two-month trial.