CosineAnnealingLR

If you have ever worked on a Computer Vision project, you might know that using a learning rate scheduler might significantly increase your model training performance. On this page, we will:

Сover the Cosine Annealing Learning Rate (CosineAnnealingLR) scheduler;
Check out its parameters;
See a potential effect from CosineAnnealingLR on a learning curve;
And check out how to work with CosineAnnealingLR using Python and the PyTorch framework.

Let’s jump in.

CosineAnnealingLR explained

CosineAnnealingLR is a scheduling technique that starts with a very large learning rate and then aggressively decreases it to a value near 0 before increasing the learning rate again.

Each time the “restart” occurs, we take the good weights from the previous “cycle” as the starting point. Thus, with each restart, the algorithm approaches the minimal loss closer.

Below is the formula for the learning rate at each step:

In this formula:

η_min and η_max represent ranges for the learning rate, with n_max being set to the initial LR;
T cur represents the number of epochs that were run since the last restart.

Parameters

T max - the maximum number of iterations;
Eta min - the minimum learning rate achievable.

CosineAnnealingLR visualized

Code Implementation

  
Hello, thank you for using the code provided by CloudFactory. Please note that some code blocks might not be 100% complete and ready to be run as is. This is done intentionally as we focus on implementing only the most challenging parts that might be tough to pick up from scratch. View our code block as a LEGO block - you can’t use it as a standalone solution, but you can take it and add it to your system to complement it.

      python
      
    
      import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=1000, eta_min=0, last_epoch=-1, verbose=False)
for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()
    

Learn more about other schedulers…

Boost model performance quickly with AI-powered labeling and 100% QA.

Learn more

Last modified 20min ago

Previous - Scheduler

ReduceLROnPlateau

Next - Computer Vision augmentations

Comprehensive overview of data augmentations in Machine Learning