This choice depends on future positive-case behavior, not only class imbalance.
Use anomaly detection when: positives are rare, diverse, and likely to include new patterns not represented in current labels.
Use supervised learning when: you have enough labeled positives/negatives and future positives resemble historical positives.
Fraud vs spam contrast: fraud patterns evolve quickly, making novelty detection valuable; spam patterns are more repetitive, making supervised classification effective.
Manufacturing contrast: known recurring defects can be supervised; unknown future defect types are better handled by anomaly detection.
Decision rule: ask whether your positive class is stable and well-covered. If not, anomaly detection is often the safer baseline.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- Pick anomaly detection for rare and evolving positives; pick supervised learning when positives are sufficiently labeled and stable.
- Use anomaly detection when: positives are rare, diverse, and likely to include new patterns not represented in current labels.
- But it turns out that the way anomaly detection looks at the data set versus the way supervised learning looks at the data set are quite different.
- Because what anomaly detection does is it looks at the normal examples that is the y = 0 negative examples and just try to model what they look like.
- Whether to use anomaly detection or supervised learning.
- Manufacturing contrast: known recurring defects can be supervised; unknown future defect types are better handled by anomaly detection.
- Fraud vs spam contrast: fraud patterns evolve quickly, making novelty detection valuable; spam patterns are more repetitive, making supervised classification effective.
- Although supervised learning is used to find previously observed forms of fraud.
Tradeoffs You Should Be Able to Explain
- More expressive models improve fit but can reduce interpretability and raise overfitting risk.
- Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.
- Feature-rich pipelines improve performance ceilings but increase maintenance and monitoring complexity.
First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.
Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.