Advantages of Stochastic Gradient Descent

  1. It is easier to fit in the memory due to a single training example being processed by the network.
  2. It is computationally fast as only one sample is processed at a time.
  3. For larger datasets, it can converge faster as it causes updates to the parameters more frequently.
  4. Due to frequent updates, the steps taken towards the minima of the loss function have oscillations that can help to get out of the local minimums of the loss function (in case the computed position turns out to be the local minimum).

Disadvantages of Stochastic Gradient Descent

  1. Due to frequent updates, the steps taken towards the minima are very noisy. This can often lean the gradient descent into other directions.
  2. Also, due to noisy steps, it may take longer to achieve convergence to the minima of the loss function.
  3. Frequent updates are computationally expensive because of using all resources for processing one training sample at a time.
  4. It loses the advantage of vectorized operations as it deals with only a single example at a time.