Cosine AnnealingLR is a scheduling technique that starts off with a very large learning rate and then aggressively decreases it to a value near 0, before again increasing the learning rate.

This variation of the learning rate happens according to the cosine annealing schedule. Mathematically, if the learning rate is given as,

\etat=\eta{min}+\frac{1}{2}(\eta{max}-\eta{min}) \left (1+cos \left( \frac{T{cur}}{T{max}} \pi \right) \right)

Where €€\etat€€ is the current learning rate, €€\eta{max}€€ and €€\eta{min}€€ are the maximum and the minimum learning rates respectively, €€T{cur}€€ is the current number of accumulated epochs.

From the above equation, we can see that once €€T{cur}=T{max}€€, the learning rate becomes €€\eta_{min}€€.

Major Parameters

T Max

T Max is the maximum number of iterations that is used in the aforementioned function.

Learning rate with different T max. Source:

Note that the €€T0€€ in the given figure is €€T{max}€€.

Eta Min

It is the minimum achievable learning rate with the given cosine annealing schedule. In the given figure, we can see that the learning rate goes to 0 and then increases aggressively. Hence, the eta min is 0.

Code Implementation

import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler=torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
for epoch in range(20):
    for input, target in dataset:
        output = model(input)
        loss = loss_fn(output, target)
Last updated on Jun 01, 2022

Get to production reliably.

Hasty is a unified agile ML platform for your entire Vision AI pipeline — with minimal integration effort for you.