Skip to content
Concept-Lab
โ† Machine Learning๐Ÿง  20 / 114
Machine Learning

Inference & Forward Propagation

The algorithm for making predictions: computing activations left to right through all layers.

Core Theory

Forward propagation is the inference algorithm โ€” the sequence of computations that turns input x into prediction ลท. It proceeds layer by layer, left to right, which is why it's called "forward".

Step-by-step for a 3-layer network:

  1. Compute a[1]: apply layer 1's 25 units to input x. Each unit computes sigmoid(wยทx + b). Output: vector of 25 activations.
  2. Compute a[2]: apply layer 2's 15 units to a[1]. Output: vector of 15 activations.
  3. Compute a[3]: apply the single output unit to a[2]. Output: scalar probability.
  4. Optional threshold: if a[3] โ‰ฅ 0.5, predict ลท = 1 (same as logistic regression).

Output of a neural network is also written as f(x) โ€” consistent with how we wrote logistic regression output in Course 1. The neural network is just a more expressive version of f(x).

Forward vs backward: forward propagation computes predictions. Backward propagation (backprop) โ€” covered in Week 2 โ€” computes gradients for training. If you already have trained parameters w and b (e.g., downloaded from the internet), you only need forward propagation for inference.

Typical architecture pattern: more units in earlier layers, fewer as you get deeper toward the output. This is a common and generally effective architecture choice.

Interview-Ready Deepening

Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.

  • The algorithm for making predictions: computing activations left to right through all layers.
  • Forward propagation is the inference algorithm โ€” the sequence of computations that turns input x into prediction ลท.
  • It proceeds layer by layer, left to right, which is why it's called "forward".
  • This will be an algorithm called forward propagation.
  • Compute a [1] : apply layer 1's 25 units to input x. Each unit computes sigmoid(wยทx + b). Output: vector of 25 activations.
  • Compute a [2] : apply layer 2's 15 units to a [1] . Output: vector of 15 activations.
  • Backward propagation (backprop) โ€” covered in Week 2 โ€” computes gradients for training.
  • Given these 64 input features, we're going to use the neural network with two hidden layers.

Tradeoffs You Should Be Able to Explain

  • More expressive models improve fit but can reduce interpretability and raise overfitting risk.
  • Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.
  • Feature-rich pipelines improve performance ceilings but increase maintenance and monitoring complexity.

First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.

Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.

Forward propagation is a deterministic dataflow graph. Once the parameters are fixed, inference is just repeated function application from left to right. This is why inference services can be cached, benchmarked, and profiled like any other computation pipeline.

Production view: inference latency depends on layer count, width, activation cost, and batch size. The math topic here connects directly to serving design later: knowing where activations are computed tells you where memory, latency, and numerical issues show up in real systems.

๐Ÿงพ Comprehensive Coverage

Exhaustive coverage points to ensure complete topic understanding without missing core concepts.

Loading interactive module...

๐Ÿ’ก Concrete Example

Handwritten digit recognition (8ร—8 image, binary 0 vs 1): x is 64 numbers โ†’ a[1] is 25 numbers โ†’ a[2] is 15 numbers โ†’ a[3] is 1 number (probability of being digit '1'). If a[3] = 0.73, threshold at 0.5 gives ลท = 1. That's the complete inference path.

๐Ÿง  Beginner-Friendly Examples

Guided Starter Example

Handwritten digit recognition (8ร—8 image, binary 0 vs 1): x is 64 numbers โ†’ a[1] is 25 numbers โ†’ a[2] is 15 numbers โ†’ a[3] is 1 number (probability of being digit '1'). If a[3] = 0.73, threshold at 0.5 gives ลท = 1. That's the complete inference path.

Source-grounded Practical Scenario

The algorithm for making predictions: computing activations left to right through all layers.

Source-grounded Practical Scenario

Forward propagation is the inference algorithm โ€” the sequence of computations that turns input x into prediction ลท.

๐Ÿงญ Architecture Flow

Loading interactive module...

๐ŸŽฌ Interactive Visualization

๐Ÿ›  Interactive Tool

๐Ÿงช Interactive Sessions

  1. Concept Drill: Manipulate key parameters and observe behavior shifts for Inference & Forward Propagation.
  2. Failure Mode Lab: Trigger an edge case and explain remediation decisions.
  3. Architecture Reorder Exercise: Reorder 5 flow steps into the correct production sequence.

๐Ÿ’ป Code Walkthrough

Concept-to-code walkthrough checklist for this topic.

  1. Define input/output contract before reading implementation details.
  2. Map each conceptual step to one concrete function/class decision.
  3. Call out one tradeoff and one failure mode in interview wording.

๐ŸŽฏ Interview Prep

Questions an interviewer is likely to ask about this topic. Think through your answer before reading the senior angle.

  • Q1[beginner] Describe forward propagation step by step for a 3-layer neural network.
    Strong answer structure: define the concept in one sentence, ground it in a concrete scenario (The algorithm for making predictions: computing activations left to right through all layers.), then explain one tradeoff (More expressive models improve fit but can reduce interpretability and raise overfitting risk.) and how you'd monitor it in production.
  • Q2[intermediate] What is the difference between forward propagation and backpropagation?
    Strong answer structure: define the concept in one sentence, ground it in a concrete scenario (The algorithm for making predictions: computing activations left to right through all layers.), then explain one tradeoff (More expressive models improve fit but can reduce interpretability and raise overfitting risk.) and how you'd monitor it in production.
  • Q3[expert] Why is the neural network output written as both a[L] and f(x)?
    Strong answer structure: define the concept in one sentence, ground it in a concrete scenario (The algorithm for making predictions: computing activations left to right through all layers.), then explain one tradeoff (More expressive models improve fit but can reduce interpretability and raise overfitting risk.) and how you'd monitor it in production.
  • Q4[expert] How would you explain this in a production interview with tradeoffs?
    In interviews, always connect inference back to deployment: 'Forward propagation is what runs in production โ€” it's the prediction path. Backpropagation only runs during training, offline. In serving infrastructure, you only need the trained weights and forward pass.'
๐Ÿ† Senior answer angle โ€” click to reveal
Use the tier progression: beginner correctness -> intermediate tradeoffs -> expert production constraints and incident readiness.

๐Ÿ“š Revision Flash Cards

Test yourself before moving on. Flip each card to check your understanding โ€” great for quick revision before an interview.

Loading interactive module...