Classification maps input features to a finite label set. Unlike regression, which predicts any numeric value, classification predicts membership in predefined classes.
Core types:
- Binary classification: two labels (fraud/not fraud, malignant/benign).
- Multi-class classification: one label among many (digit 0-9, disease type A/B/C).
- Multi-label classification: multiple labels can be true at once (email tagged as both billing + urgent).
Model output perspective: most classifiers produce class probabilities, then apply a threshold or argmax to emit final class decisions. Threshold tuning is a business decision, not just a model detail.
Evaluation must match risk profile:
- Accuracy for balanced low-risk tasks.
- Precision/Recall/F1 when false positives/negatives have different costs.
- ROC-AUC/PR-AUC for threshold sensitivity and imbalanced data.
Common failure mode: using accuracy alone on imbalanced datasets (e.g., 99% non-fraud), which can look strong while missing almost all positives.
Deepening Notes
Source-backed reinforcement: these points are extracted from the session source note to strengthen your theory intuition.
- There's a second major type of supervised learning algorithm called a classification algorithm.
- But what makes classification different from regression when you're interpreting the numbers is that classification predicts a small finite limited set of possible output categories such as 0, 1 and 2 but not all possible numbers in between like 0.5 or 1.7.
- The two major types of supervised learning our regression and classification.
- Whereas in classification the learning algorithm has to make a prediction of a category, all of a small set of possible outputs.
- So you now know what is supervised learning, including both regression and classification.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- There's a second major type of supervised learning algorithm called a classification algorithm.
- The two major types of supervised learning our regression and classification.
- Whereas in classification the learning algorithm has to make a prediction of a category, all of a small set of possible outputs.
- In the example of supervised learning that we've been looking at, we had only one input value the size of the tumor.
- In a regression application like predicting prices of houses, the learning algorithm has to predict numbers from infinitely many possible output numbers.
- Multi-class classification : one label among many (digit 0-9, disease type A/B/C).
- In other machine learning problems often many more input values are required.
- Multi-label classification : multiple labels can be true at once (email tagged as both billing + urgent).
Tradeoffs You Should Be Able to Explain
- More expressive models improve fit but can reduce interpretability and raise overfitting risk.
- Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.
- Feature-rich pipelines improve performance ceilings but increase maintenance and monitoring complexity.
First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.
Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.