Advantages of Batch Gradient Descent

  1. Fewer oscillations and noisy steps are taken towards the global minima of the loss function because of updating the parameters by computing the average of all the training samples rather than the value of a single sample.
  2. It can benefit from the vectorization which increases the speed of processing all training samples together.
  3. It produces a more stable gradient descent convergence and stable error gradient than stochastic gradient descent.
  4. It is computationally efficient as all computer resources are not being used to process a single sample rather are being used for all training samples.

Disadvantages of Batch Gradient Descent

  1. Sometimes a stable error gradient can lead to local minima and unlike stochastic gradient descent, no noisy steps are there to help to get out of the local minima.
  2. The entire training set can be too large to process in the memory due to which additional memory might be needed.
  3. Depending on computer resources it can take too long for processing all training samples as a batch.