Algorithm structure: for each feature x_j, estimate mu_j and sigma_j^2 from training data, then compute per-feature Gaussian probabilities and multiply them into p(x).
p(x) = product_j p(x_j; mu_j, sigma_j^2)
Decision rule: if p(x) < epsilon, mark as anomaly; otherwise treat as normal.
Core intuition: one strongly unusual feature can push the product probability down sharply, making the whole example stand out.
Assumption: this factorization corresponds to conditional independence approximation. Even when that assumption is imperfect, the approach can still be practical with good features.
Failure mode: correlated features and poor feature engineering can make p(x) poorly calibrated, causing both misses and noisy alerts.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- Fit one Gaussian per feature, multiply densities into p(x), then classify with epsilon threshold.
- Algorithm structure: for each feature x_j, estimate mu_j and sigma_j^2 from training data, then compute per-feature Gaussian probabilities and multiply them into p(x).
- Understanding statistical independence is not needed to fully complete this class and also, be able to very effectively use anomaly detection algorithm.
- But it turns out this algorithm often works fine even that the features are not actually statistically independent.
- Decision rule: if p(x) < epsilon, mark as anomaly; otherwise treat as normal.
- To model p(x2) x2 is a totally different feature measuring the vibrations of the airplane engine.
- Core intuition: one strongly unusual feature can push the product probability down sharply, making the whole example stand out.
- Even when that assumption is imperfect, the approach can still be practical with good features.
Tradeoffs You Should Be Able to Explain
- More expressive models improve fit but can reduce interpretability and raise overfitting risk.
- Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.
- Feature-rich pipelines improve performance ceilings but increase maintenance and monitoring complexity.
First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.
Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.