Cosine AnnealingLR is a scheduling technique that starts off with a very large learning rate and then aggressively decreases it to a value near 0, before again increasing the learning rate.

This variation of the learning rate happens according to the cosine annealing schedule. Mathematically, if the learning rate is given as,

$$

\eta*t=\eta*{min}+\frac{1}{2}(\eta*{max}-\eta*{min}) \left (1+cos \left( \frac{T*{cur}}{T*{max}} \pi \right) \right)

$$

Where €€\eta*t€€ is the current learning rate, €€\eta*{max}€€ and €€\eta*{min}€€ are the maximum and the minimum learning rates respectively, €€T*{cur}€€ is the current number of accumulated epochs.

From the above equation, we can see that once €€T*{cur}=T*{max}€€, the learning rate becomes €€\eta_{min}€€.

T Max is the maximum number of iterations that is used in the aforementioned function.

Note that the €€T*0€€ in the given figure is €€T*{max}€€.

It is the minimum achievable learning rate with the given cosine annealing schedule. In the given figure, we can see that the learning rate goes to 0 and then increases aggressively. Hence, the eta min is 0.

```
import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler=torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
for epoch in range(20):
for input, target in dataset:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
scheduler.step()
```

Our platform is completely free to try. Sign up today to start your two-month trial.

On the 9th of February, we are hosting a ML-IRL event with speakers from Bayer, Intel, and Infineon.