Cosine AnnealingLR is a scheduling technique that starts off with a very large learning rate and then aggressively decreases it to a value near 0, before again increasing the learning rate.
This variation of the learning rate happens according to the cosine annealing schedule. Mathematically, if the learning rate is given as,
$$
\etat=\eta{min}+\frac{1}{2}(\eta{max}-\eta{min}) \left (1+cos \left( \frac{T{cur}}{T{max}} \pi \right) \right)
$$
Where €€\etat€€ is the current learning rate, €€\eta{max}€€ and €€\eta{min}€€ are the maximum and the minimum learning rates respectively, €€T{cur}€€ is the current number of accumulated epochs.
From the above equation, we can see that once €€T{cur}=T{max}€€, the learning rate becomes €€\eta_{min}€€.
T Max is the maximum number of iterations that is used in the aforementioned function.
Note that the €€T0€€ in the given figure is €€T{max}€€.
It is the minimum achievable learning rate with the given cosine annealing schedule. In the given figure, we can see that the learning rate goes to 0 and then increases aggressively. Hence, the eta min is 0.
import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler=torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
for epoch in range(20):
for input, target in dataset:
optimizer.zero_grad()
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
scheduler.step()
Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.