If you have ever worked on a Computer Vision project, you might know that using a learning rate scheduler might significantly increase your model training performance. On this page, we will:

  • Сover the Multi-Step Learning Rate (MultiStepLR) scheduler;
  • Check out its parameters;
  • See a potential effect from MultiStepLR on a learning curve;
  • And check out how to work with MultiStepLR using Python and the PyTorch framework.

Let’s jump in.

The MultiStepLR is a scheduling technique that decays the learning rate of each parameter group by gamma once the number of epochs reaches one of the milestones.

Compared to the StepLR, which reduces the learning rate every N epochs, with the MultiStepLR, we can say when exactly we want to decrease the learning rate. For instance, we might specify that the learning rate should be scaled by gamma after the 2nd, 8th, and 10th epochs.
  • Milestones - the indices (numbers) of epochs after which the learning rate is reduced;
The milestone values should be placed in increasing order.
  • Gamma - a multiplicative factor by which the learning rate is decayed. For instance, if the learning rate is 1000 and gamma is 0.5, the new learning rate will be 1000 x 0.5 = 500.
The gamma value should be less than 1 to reduce the learning rate.
Learning rate over epochs with MultiStepLR applied
Source
python
      import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30,80], gamma=0.1, last_epoch=-1, verbose=False)
for epoch in range(20):
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    scheduler.step()
    

Accelerated Annotation.
Maximize model performance quickly with AI-powered labeling and 100% QA.

Learn more
Last modified