We shall make use of Adam optimization to briefly explain the epsilon coefficient. For the Adam optimizer, we know that the first and second moments are calculated via;
is the derivative of the loss function with respect to a parameter.
is the running average of the decaying gradients(momentum term) and is the decaying average of the gradients.
And the parameter updates are done as follows;
The epsilon in the aforementioned update is the epsilon coefficient.
Note that when the bias-corrected gets close to zero, the denominator is undefined. Hence, the update is arbitrary. To rectify this, we use a small epsilon such that it stabilizes this numeric.
import torch # N is batch size; D_in is input dimension; # H is hidden dimension; D_out is output dimension. N, D_in, H, D_out = 64, 1000, 100, 10 # Create random Tensors to hold inputs and outputs. x = torch.randn(N, D_in) y = torch.randn(N, D_out) # Use the nn package to define our model and loss function. model = torch.nn.Sequential( torch.nn.Linear(D_in, H), torch.nn.ReLU(), torch.nn.Linear(H, D_out), ) loss_fn = torch.nn.MSELoss(reduction='sum') # Use the optim package to define an Optimizer that will update the weights of # the model for us. Here we will use Adam; the optim package contains many other # optimization algorithms. The first argument to the Adam constructor tells the # optimizer which Tensors it should update. learning_rate = 1e-4 optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate,amsgrad=true,eps=1e-08) #setting the amsgrad to be true #setting the epsilon to be 1e-08 #note that we are using Adam in our example for t in range(500): # Forward pass: compute predicted y by passing x to the model. y_pred = model(x) # Compute and print loss. loss = loss_fn(y_pred, y) print(t, loss.item()) # Before the backward pass, use the optimizer object to zero all of the # gradients for the Tensors it will update (which are the learnable weights # of the model) optimizer.zero_grad() # Backward pass: compute gradient of the loss with respect to model parameters loss.backward() # Calling the step function on an Optimizer makes an update to its parameters optimizer.step()