Splitting Dataset in Machine Learning

Less than 500 views Posted On Aug. 17, 2020

Data is split in three ways in Machine Learning -

Training Data

  • Training Data is used to train our model.
  • This is the data that our model actually sees (both input and output) and learn from.

Validation Data

  • Validation data is used to do a frequent evaluation of the model, fit on training data along with improving involved hyperparameters.
  • This data plays its part when the model is actually training.

Testing Data

  • Once our model is completely trained, testing data provides an unbiased evaluation.
  • When we feed in the inputs of testing data, our model will predict some values without seeing the actual output.
  • After prediction, we evaluate our model by comparing it with the actual output present in the testing data.
  • This is how we evaluate and see how much our model has learned from the experiences feed in as training data, set at the time of training.
Share this tutorial with someone who needs it

What are your thoughts?