CosineAnnealingLR

If you have ever worked on a Computer Vision project, you might know that using a learning rate scheduler might significantly increase your model training performance. On this page, we will:

  • Сover the Cosine Annealing Learning Rate (CosineAnnealingLR) scheduler;
  • Check out its parameters;
  • See a potential effect from CosineAnnealingLR on a learning curve;
  • And check out how to work with CosineAnnealingLR using Python and the PyTorch framework.

Let’s jump in.

CosineAnnealingLR is a scheduling technique that starts with a very large learning rate and then aggressively decreases it to a value near 0 before increasing the learning rate again.

Each time the “restart” occurs, we take the good weights from the previous “cycle” as the starting point. Thus, with each restart, the algorithm approaches the minimal loss closer.

Below is the formula for the learning rate at each step:

Source

In this formula:

  • η_min and η_max represent ranges for the learning rate, with n_max being set to the initial LR;
  • T cur represents the number of epochs that were run since the last restart.

Parameters

  • T max - the maximum number of iterations;
  • Eta min - the minimum learning rate achievable.
Source
    import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()
    
  
Last updated on Dec 21, 2022

Get AI confident. Start using Hasty today.

Automate 90% of the work, reduce your time to deployment by 40%, and replace your whole ML software stack with our platform.

Start for free Request a demo