K-means has an explicit objective function. The distortion cost is the average squared distance between each training point and its assigned centroid.
J = (1/m) * sum_i ||x(i) - mu_(c(i))||^2
Assignment step effect: with centroids fixed, assigning each point to the nearest centroid decreases or preserves J.
Update step effect: with assignments fixed, replacing each centroid with the mean of its assigned points decreases or preserves J.
Convergence implication: J is non-increasing per iteration, so K-means converges to a stable local optimum.
Debugging rule: if your measured distortion increases after a full iteration, suspect an implementation bug in assignment/update or indexing.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- K-means minimizes distortion: average squared distance from each point to its assigned centroid.
- The distortion cost is the average squared distance between each training point and its assigned centroid.
- Update step effect: with assignments fixed, replacing each centroid with the mean of its assigned points decreases or preserves J.
- It turns out that choosing mu K to be average and the mean of the points assigned is the choice of these terms mu that will minimize this expression.
- This average location in the middle of these two training examples, that is really the value that minimizes the square distance.
- Assignment step effect: with centroids fixed, assigning each point to the nearest centroid decreases or preserves J.
- The cluster centroid to which extent has been assigned and taking the square of that distance and that would be one of the terms over here that we're averaging over.
- That means there's a bug in the code, it should never go up because every single step of K means is setting the value CI and mu K to try to reduce the cost function.
Tradeoffs You Should Be Able to Explain
- Higher recall often increases context noise; reranking and filtering are required to keep precision high.
- Smaller chunks improve semantic precision but can break cross-sentence context needed for accurate answers.
- Aggressive grounding reduces hallucinations but can increase abstentions when retrieval coverage is weak.
First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.
Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.