Momentum (SGD)

Momentum speeds up the SGD optimizer to reach the local minimum quicker. If we move in the same direction in the loss landscape, the optimizer will take bigger steps on the loss landscape.

A nice side effect of momentum is that it smooths the way SGD takes when the gradients of each iteration point into different directions.

Image Source at the end of the page

Momentum is implemented by adding a term to the weight update rule of the SGD together with a parameter γ ranging between 0 and 1. But never ever set it to 1, then bad things will happen!

A typical value for the momentum parameter that proved to be robust is 0.9.

Without dampening, momentum might cause you to miss the minimum because momentum increased the step size so much that the optimizer 'jumps over' it.

Last updated on Dec 19, 2022

Removing the risk from vision AI.

Only 13% of vision AI projects make it to production, with Hasty we boost that number to 100%.

Start for free Check out our services