If you have ever worked on a Computer Vision project, you might know that using a learning rate scheduler might significantly increase your model training performance. On this page, we will:

  • Сover the Multi-Step Learning Rate (MultiStepLR) scheduler;
  • Check out its parameters;
  • See a potential effect from MultiStepLR on a learning curve;
  • And check out how to work with MultiStepLR using Python and the PyTorch framework.

Let’s jump in.

The MultiStepLR is a scheduling technique that decays the learning rate of each parameter group by gamma once the number of epochs reaches one of the milestones.

Compared to the StepLR, which reduces the learning rate every N epochs, with the MultiStepLR, we can say when exactly we want to decrease the learning rate. For instance, we might specify that the learning rate should be scaled by gamma after the 2nd, 8th, and 10th epochs.


  • Milestones - the indices (numbers) of epochs after which the learning rate is reduced;
The milestone values should be placed in increasing order.
  • Gamma - a multiplicative factor by which the learning rate is decayed. For instance, if the learning rate is 1000 and gamma is 0.5, the new learning rate will be 1000 x 0.5 = 500.
The gamma value should be less than 1 to reduce the learning rate.
Learning rate over epochs with MultiStepLR applied
    import torch
model = [Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate, weight_decay=0.01, amsgrad=False)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[30,80], gamma=0.1, last_epoch=-1, verbose=False)
for epoch in range(20):
    for input, target in dataset:
        output = model(input)
        loss = loss_fn(output, target)
Last updated on Dec 21, 2022

Removing the risk from vision AI.

Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.

Start for free Check out our services