If you use the test set to choose between models (e.g. trying polynomial degrees 1โ10 and picking the best), you introduce a subtle bias: J_test for the chosen model is an optimistically biased estimate of generalization. The test set was used as part of the selection process.
The solution: add a third split โ the cross-validation set (also called validation set, dev set, or development set).
Three-way split: training (~60%) / cross-validation (~20%) / test (~20%)
Workflow:
- Fit parameters w, b on the training set
- Choose model (degree, architecture, ฮป) using J_cv on the cross-validation set
- Report final performance using J_test on the test set
This keeps the test set pristine โ it never influenced any decision. J_test is then an unbiased estimate of true generalization error.
Cross-validation applies to choosing any model hyperparameter: polynomial degree, neural network architecture (layers/units), regularization ฮป, etc.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- Three-way split: training (~60%) / cross-validation (~20%) / test (~20%)
- The name cross-validation refers to that this is an extra dataset that we're going to use to check or cross check the validity or really the accuracy of different models.
- Choose model (degree, architecture, ฮป) using J_cv on the cross-validation set
- The solution: add a third split โ the cross-validation set (also called validation set, dev set, or development set).
- Cross-validation applies to choosing any model hyperparameter: polynomial degree, neural network architecture (layers/units), regularization ฮป, etc.
- This model selection procedure also works for choosing among other types of models.
- Where here, m_cv equals 2 in this example, is the number of cross-validation examples.
- This term, in addition to being called cross-validation error, is also commonly called the validation error for short, or even the development set error, or the dev error.
Tradeoffs You Should Be Able to Explain
- More expressive models improve fit but can reduce interpretability and raise overfitting risk.
- Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.
- Feature-rich pipelines improve performance ceilings but increase maintenance and monitoring complexity.
First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.
Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.
The train/cross-validation/test split is a governance mechanism. Training data is for fitting parameters. Cross-validation data is for choosing models and hyperparameters. Test data is for one final unbiased estimate after your design choices are already locked in.
Failure mode: peeking at the test set during iteration quietly turns it into another validation set. Once that happens, your final reported test number no longer represents unseen performance.