While employing machine
learning algorithm, there is a need to split the data such that the
model is trained and on the basis of that model, we validate certain
data that tells us about the general loss of the model.
The input data is split into three groups:
Training set : The training set is the one that is used to train the model, i.e, search for the parameters of the model.
Validation set : They are the set of data used to
tune the hyperparameters of the model. For example, the validation set
can be used to select the number of layers in a neural network.
Test set : Set of data used to assess the performance of the model.
Common ratios used for the split:
80% training set, 10% validation set, 10% test set
70% training set, 15% validation set, 15% test set