Advanced Learning Algorithms
The advanced ML track. It stays transcript-driven as new advanced lessons are added into Concept Lab.
Concepts Covered
Advanced Learning Algorithms
0/62 doneIntroduction to Advanced Learning Algorithms
Course 2 overview: neural networks, decision trees, and practical ML system advice.
Neurons and the Brain
The biological motivation behind neural networks and why deep learning took off when it did.
Neural Networks: Demand Prediction
Building intuition for neural networks using a T-shirt top-seller prediction example.
Recognising Images with Neural Networks
How neural networks build up visual understanding layer by layer — edges, parts, then faces.
Neural Network Layers
Layer notation, superscripts, and how a single hidden layer computes its activations.
More Complex Neural Networks
Multi-layer networks, counting conventions, and the general activation formula for any layer.
Inference & Forward Propagation
The algorithm for making predictions: computing activations left to right through all layers.
Inference in Code (TensorFlow)
Implementing forward propagation in TensorFlow with Dense layers — the coffee roasting example.
Data Representation in TensorFlow
NumPy 1D vectors vs 2D matrices, TensorFlow tensors, and why the double bracket matters.
Building a Neural Network in TensorFlow
The Sequential API: string layers together, compile, fit, and predict in one workflow.
Forward Prop: Single Layer from Scratch
Implementing forward propagation in raw Python/NumPy — understanding what TensorFlow does under the hood.
General Forward Propagation
The dense() function: a reusable loop-based implementation of any layer using a weight matrix.
Is There a Path to AGI?
ANI vs AGI, the one learning algorithm hypothesis, and keeping hype calibrated.
Vectorised Neural Network Implementation
Why matrix multiplication makes neural networks fast and how NumPy's matmul replaces for-loops.
Matrix Multiplication
Dot products, transposes, and vector-matrix products — the mathematical building blocks of neural networks.
Matrix Multiplication Rules
The general formula for AᵀW — computing every element systematically from rows and columns.
Matrix Multiplication in Code
NumPy matmul, the @ operator, and the complete vectorised forward pass in TensorFlow.
Training a Neural Network: Overview
The three-step training loop in TensorFlow: specify architecture, compile with loss, fit to data.
Training Details: Loss, Cost, and Backprop
Binary cross-entropy loss, cost function over all examples, and how TensorFlow uses backprop internally.
Alternatives to the Sigmoid Activation
ReLU, linear activation, and why hidden layers should almost never use sigmoid.
Choosing Activation Functions
How to pick the right activation function for output and hidden layers based on what you're predicting.
Why Do We Need Activation Functions?
Without nonlinear activations, a deep neural network collapses into a simple linear model.
Multiclass Classification
When y can take more than two values — classifying digits 0–9, diseases, or defect types.
Softmax Regression
The generalization of logistic regression to n classes — computing mutually exclusive class probabilities.
Neural Network with Softmax Output
Plugging softmax into the output layer to build a multiclass neural network.
Numerically Stable Softmax
Using from_logits=True to avoid floating-point roundoff errors in softmax and logistic loss.
Multi-Label Classification
When one input can have multiple independent labels simultaneously — cars AND pedestrians in one image.
Adam Optimizer
Adaptive Moment Estimation — the de facto standard optimizer that auto-adjusts per-parameter learning rates.
Convolutional Layers
Beyond dense layers — how convolutional layers let neurons see only local regions for speed and robustness.
What is a Derivative?
Intuitive foundations of calculus derivatives — the engine behind gradient descent and backpropagation.
Computation Graph and Backprop
How neural network frameworks compute gradients efficiently using forward and backward passes through a graph.
Backprop in a Larger Network
Tracing backpropagation through a two-layer network — seeing how gradients flow back to every parameter.
Deciding What to Try Next
The systematic approach to ML debugging — why intuition fails and diagnostics save months of wasted effort.
Evaluating a Model
Train/test splits and why J_train alone deceives you — measuring generalization systematically.
Model Selection and Cross-Validation
The three-way split — why you need a cross-validation set to choose models without contaminating the test set.
Diagnosing Bias and Variance
The two fundamental failure modes of ML models — high bias underfits, high variance overfits.
Regularization and Bias-Variance
How λ shifts the bias-variance tradeoff — and how cross-validation finds the sweet spot automatically.
Establishing a Baseline Level of Performance
Why raw error numbers are misleading without a baseline — human-level performance as the anchor for bias-variance judgment.
Learning Curves
How training and cross-validation error change as data grows, and what that tells you about whether collecting more data is worth it.
Deciding What to Try Next, Revisited
How bias and variance map directly to the next engineering move, so you stop guessing and start debugging systematically.
Bias, Variance, and Neural Networks
Why deep learning changed the old bias-variance tradeoff story and gave engineers a new recipe for improving models.
Iterative Loop of ML Development
The real workflow of ML engineering: choose architecture, train, diagnose, refine, and repeat until performance is good enough.
Error Analysis
Manual review of model mistakes to discover which error classes matter most and where engineering effort will pay off.
Adding Data
Targeted data collection, augmentation, and synthetic data generation as strategic tools for improving model quality.
Transfer Learning
Use a model pre-trained on a large related dataset, then fine-tune it on your smaller task to get strong results with limited data.
Full Cycle of a Machine Learning Project
Training a model is only one stage; real ML systems also require scoping, deployment, monitoring, retraining, and MLOps discipline.
Fairness, Bias, and Ethics
Why ML engineers must think about harm, subgroup performance, and mitigation plans before and after deployment.
Error Metrics for Skewed Datasets
Why accuracy becomes misleading on rare-event problems, and how the confusion matrix gives a more truthful view of model usefulness.
Trading Off Precision and Recall
How threshold choices change which rare events you catch, which false alarms you accept, and why F1 is a useful but incomplete summary.
Decision Tree Model
A decision tree predicts by asking a sequence of feature-based questions, routing an example down branches until it reaches a leaf decision.
Decision Tree Learning Process
How a tree is built recursively: choose the best split, partition the data, repeat on each branch, and stop when further splitting is no longer worth it.
Measuring Purity: Entropy
Entropy is the impurity measure that tells a decision tree how mixed a node is, with 0 meaning pure and 1 meaning maximally mixed in the binary case.
Choosing a Split with Information Gain
Information gain measures how much a candidate split reduces weighted entropy, allowing the tree to choose the most purity-improving feature.
Decision Tree: Putting It Together
The full tree-building algorithm combines repeated split selection, recursive branch construction, and stopping rules into one practical training loop.
One-Hot Encoding of Categorical Features
How to convert a feature with multiple discrete categories into several binary indicators so trees and other models can use it cleanly.
Continuous-Valued Features
How trees handle numeric features by testing candidate thresholds and selecting the split with the highest information gain.
Regression Trees
Generalizing decision trees from class prediction to numeric prediction by minimizing weighted variance and predicting leaf averages.
Using Multiple Decision Trees
Why single trees are sensitive to small data changes and how voting across many trees improves robustness.
Sampling with Replacement
Bootstrap sampling creates new training sets by repeatedly drawing from the original set with replacement.
Random Forest Algorithm
Bagging plus random feature subsets per split yields more diverse trees and stronger aggregate performance.
XGBoost
Boosted trees focus sequentially on hard examples and are often top-performing on structured/tabular tasks.
When to Use Decision Trees
Choosing between tree ensembles and neural networks based on data modality, iteration speed, interpretability, and transfer learning needs.