NEW
All annotation is now free in Hasty.

CosineAnnealingLR

Cosine AnnealingLR is a scheduling technique that starts off with a very large learning rate and then aggressively decreases it to a value near 0, before again increasing the learning rate.

This variation of the learning rate happens according to the cosine annealing schedule. Mathematically, if the learning rate is given as,

$$
\etat=\eta{min}+\frac{1}{2}(\eta{max}-\eta{min}) \left (1+cos \left( \frac{T{cur}}{T{max}} \pi \right) \right)
$$

Where €€\etat€€ is the current learning rate, €€\eta{max}€€ and €€\eta{min}€€ are the maximum and the minimum learning rates respectively, €€T{cur}€€ is the current number of accumulated epochs.

From the above equation, we can see that once €€T{cur}=T{max}€€, the learning rate becomes €€\eta_{min}€€.

Major Parameters

T Max

T Max is the maximum number of iterations that is used in the aforementioned function.

Learning rate with different T max. Source: https://arxiv.org/abs/1608.03983v5

Note that the €€T0€€ in the given figure is €€T{max}€€.

Eta Min

It is the minimum achievable learning rate with the given cosine annealing schedule. In the given figure, we can see that the learning rate goes to 0 and then increases aggressively. Hence, the eta min is 0.

Code Implementation


import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler=torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()
Last updated on Jun 01, 2022

Removing the risk from vision AI.

Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.