-to test it, we need to split the data into training set (70) and testing set (30), but that’s not where splitting stops, we split it more to work on more parameters
(confusing ? don’t worry, i got the bullet points)
used to tune the model’s hyperparameter and evaluate it’s performance during training
helps selecting the best model when a lot of models are trained on the same training dataset
prevents overfitting
hyperparameters such as learning rate, number of layers in a neural network
The problem with dev set
The dev set is usually a small set 15% of total data, whereas training data is huge. If one model doesn’t perform well in dev set but was performing very good in train set, it doesn’t mean that the model is bad. It’s like training an athlete for marathon but selecting the best sprinter. Therefore, multiple validation sets are are used to test and the result is averaged out.