Guided Starter Example
Predicting house price (always positive) โ ReLU output. Predicting temperature change (positive or negative) โ linear output. Classifying email spam/not spam โ sigmoid output. All hidden layers โ ReLU by default.
How to pick the right activation function for output and hidden layers based on what you're predicting.
The choice of activation function depends on what the neuron is computing. For the output layer, the target label y determines the natural choice:
sigmoid โ it outputs probabilities between 0 and 1.linear โ no constraint on output range.ReLU โ it only produces non-negative outputs.For hidden layers, ReLU has become the dominant default choice for most practitioners today. Although early neural networks used sigmoid everywhere, the field evolved because ReLU has two practical advantages:
max(0, z) requires no exponentiation, unlike sigmoid.Other activations (tanh, Leaky ReLU, Swish) exist and occasionally outperform ReLU in specific cases, but ReLU is the safe default. Sigmoid at hidden layers is effectively obsolete โ reserve it only for binary classification output neurons.
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.
Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.
The output-layer rule here is simple and powerful: choose the final activation based on the range your target must live in. Binary outcome -> sigmoid. Any real number -> linear. Non-negative real number -> ReLU. This is less about style and more about respecting the semantics of the prediction target.
Hidden-layer default: use ReLU unless you have a reason not to. The field moved there because it trains faster and avoids some of the heavy saturation behavior that made deep sigmoid networks frustrating to optimize.
Exhaustive coverage points to ensure complete topic understanding without missing core concepts.
Predicting house price (always positive) โ ReLU output. Predicting temperature change (positive or negative) โ linear output. Classifying email spam/not spam โ sigmoid output. All hidden layers โ ReLU by default.
Guided Starter Example
Predicting house price (always positive) โ ReLU output. Predicting temperature change (positive or negative) โ linear output. Classifying email spam/not spam โ sigmoid output. All hidden layers โ ReLU by default.
Source-grounded Practical Scenario
Predicting house price (always positive) โ ReLU output. Predicting temperature change (positive or negative) โ linear output. Classifying email spam/not spam โ sigmoid output. All hidden layers โ ReLU by default.
Source-grounded Practical Scenario
Sigmoid at hidden layers is effectively obsolete โ reserve it only for binary classification output neurons.
Concept-to-code walkthrough checklist for this topic.
Questions an interviewer is likely to ask about this topic. Think through your answer before reading the senior angle.
Test yourself before moving on. Flip each card to check your understanding โ great for quick revision before an interview.
Drag to reorder the architecture flow for Choosing Activation Functions. This is designed as an interview rehearsal for explaining end-to-end execution.
This lab turns the activation-function guidance into a design decision: choose the output activation from the target range, then use ReLU as the default hidden-layer activation unless you have a strong reason not to.
Target range: 0 to 1
The output should behave like a probability that y = 1.
Default modern choice for hidden layers because it is cheap to compute and only flat on one side.
Gradient view: Gradient is strong for positive z and zero only in the inactive region.
Start flipping cards to track your progress
What output activation for binary classification?
tap to reveal โSigmoid โ outputs probability between 0 and 1 that y = 1.