Guided Starter Example
J(w) = w². Using SymPy: diff(w**2, w) returns 2w. At w=3: derivative=6, meaning if w increases by 0.001, J increases by ~0.006. Gradient descent subtracts α·6 from w, pushing w toward the minimum at w=0.
Intuitive foundations of calculus derivatives — the engine behind gradient descent and backpropagation.
A derivative answers a simple question: if we nudge the input by a tiny amount ε, how much does the output change?
Formally: if increasing w by ε causes J(w) to increase by k·ε, then the derivative of J with respect to w is k.
Example: J(w) = w². If w = 3 and we increase by ε = 0.001, then J(3.001) = 9.006001. J increased by ~0.006 = 6·ε. So the derivative is 6.
The derivative depends on the current value of w. When w = 2, derivative = 4. When w = -3, derivative = -6. This is why gradient descent takes variable step sizes implicitly — the same α produces different-sized parameter updates depending on gradient magnitude.
In gradient descent: w = w - α · (dJ/dw)
Notation: for single-variable functions, use d/dw J(w). For multi-variable functions (most ML), use ∂/∂w_j J(w) — called the partial derivative.
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.
Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.
The derivative is a local sensitivity measurement. It tells you how much the objective changes when you nudge one value slightly. That is exactly what an optimizer needs: not a global theory of the landscape, but a local signal telling it which direction is currently downhill.
Training intuition: a large derivative means the current parameter still has leverage over the loss. A tiny derivative means moving that parameter a bit does not change much right now. Gradient-based training is therefore a repeated sensitivity-analysis loop.
Exhaustive coverage points to ensure complete topic understanding without missing core concepts.
J(w) = w². Using SymPy: diff(w**2, w) returns 2w. At w=3: derivative=6, meaning if w increases by 0.001, J increases by ~0.006. Gradient descent subtracts α·6 from w, pushing w toward the minimum at w=0.
Guided Starter Example
J(w) = w². Using SymPy: diff(w**2, w) returns 2w. At w=3: derivative=6, meaning if w increases by 0.001, J increases by ~0.006. Gradient descent subtracts α·6 from w, pushing w toward the minimum at w=0.
Source-grounded Practical Scenario
Intuitive foundations of calculus derivatives — the engine behind gradient descent and backpropagation.
Source-grounded Practical Scenario
J(w) = w². Using SymPy: diff(w**2, w) returns 2w. At w=3: derivative=6, meaning if w increases by 0.001, J increases by ~0.006. Gradient descent subtracts α·6 from w, pushing w toward the minimum at w=0.
Concept-to-code walkthrough checklist for this topic.
Questions an interviewer is likely to ask about this topic. Think through your answer before reading the senior angle.
Test yourself before moving on. Flip each card to check your understanding — great for quick revision before an interview.
Drag to reorder the architecture flow for What is a Derivative?. This is designed as an interview rehearsal for explaining end-to-end execution.
Start flipping cards to track your progress
Informal definition of a derivative.
tap to reveal →If increasing w by ε causes J(w) to increase by k·ε, the derivative of J w.r.t. w is k. It measures how sensitive J is to small changes in w.