The simplest neural network is a single logistic regression unit โ one neuron. The key insight is that when you chain many neurons together, you get a system that learns its own features rather than relying on manually engineered ones.
Key vocabulary introduced here:
- Activation (a): the output of a neuron. Named after the biological concept of a neuron "firing". In the demand prediction example, activation = probability of top seller.
- Layer: a group of neurons that take similar inputs and output a vector of activations together.
- Hidden layer: a middle layer whose values are not observed in the training data. The correct labels for affordability, awareness, or perceived quality are never provided โ the network learns them.
- Input layer (layer 0): the raw feature vector x.
- Output layer: the final layer producing the prediction.
Why this architecture matters: The output layer is just logistic regression โ but instead of using raw features, it uses learned features from the hidden layer. These learned features (affordability, awareness, perceived quality) are often better predictors than anything a human would engineer manually.
In practice, you don't manually assign which inputs go to which neuron. Every neuron in a layer receives all inputs from the previous layer. The network learns through training which inputs matter for each neuron by adjusting its parameters.
Interview-Ready Deepening
Source-backed reinforcement: these points add detail beyond short-duration UI hints and emphasize production tradeoffs.
- Building intuition for neural networks using a T-shirt top-seller prediction example.
- Activation (a) : the output of a neuron. Named after the biological concept of a neuron "firing". In the demand prediction example, activation = probability of top seller.
- In this example, we're going to have four features to predict whether or not a T-shirt is a top seller.
- To illustrate how neural networks work, let's start with an example.
- In this example, the input feature x is the price of the T-shirt, and so that's the input to the learning algorithm.
- That finally inputs those three numbers and outputs the probability of this t-shirt being a top seller.
- In the terminology of neural networks, we're going to group these three neurons together into what's called a layer.
- This layer on the right is also called the output layer because the outputs of this final neuron is the output probability predicted by the neural network.
Tradeoffs You Should Be Able to Explain
- More expressive models improve fit but can reduce interpretability and raise overfitting risk.
- Higher optimization speed can reduce training time but may increase instability if learning dynamics are not monitored.
- Feature-rich pipelines improve performance ceilings but increase maintenance and monitoring complexity.
First-time learner note: Read each model as a dataflow system: inputs become representations, representations become scores, and scores become decisions through a chosen loss and thresholding policy.
Production note: Track three things relentlessly in ML systems: data shape contracts, evaluation methodology, and the operational meaning of the model's errors. Most expensive failures come from one of those three.
Representation-learning lens: the hidden layer is useful because it transforms raw commercial signals into business concepts the final classifier can use. Price and shipping are not valuable only as raw numbers; they become more useful once the network turns them into affordability-like structure, while marketing becomes awareness-like structure.
Architecture flow: raw features -> learned latent concepts -> final decision. This is the first time the course shows why hidden layers matter: they create an internal language that makes the last prediction problem simpler than working directly on the raw columns.