If you have ever worked on a Computer Vision project, you might know that using a learning rate scheduler might significantly increase your model training performance. On this page, we will:

  • Сover the Cosine Annealing Learning Rate (CosineAnnealingLR) scheduler;
  • Check out its parameters;
  • See a potential effect from CosineAnnealingLR on a learning curve;
  • And check out how to work with CosineAnnealingLR using Python and the PyTorch framework.

Let’s jump in.

CosineAnnealingLR is a scheduling technique that starts with a very large learning rate and then aggressively decreases it to a value near 0 before increasing the learning rate again.

Each time the “restart” occurs, we take the good weights from the previous “cycle” as the starting point. Thus, with each restart, the algorithm approaches the minimal loss closer.

Below is the formula for the learning rate at each step:

Source

In this formula:

  • η_min and η_max represent ranges for the learning rate, with n_max being set to the initial LR;
  • T cur represents the number of epochs that were run since the last restart.
  • T max - the maximum number of iterations;
  • Eta min - the minimum learning rate achievable.
Source
python
      import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()
    

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more
Last modified