1. Intuition
The forward pass is the process of feeding input data through a neural network layer by layer to produce a prediction. Think of it like an assembly line where raw materials (input features) pass through multiple workstations (layers), each transforming the data, until a final product (prediction) emerges.
Real-Life Example 🏠 — Predicting House Prices
| Feature | Value |
|---|
| Size (sq ft) | 1500 |
| Bedrooms | 3 |
| Age (years) | 10 |
A neural network processes these features through layers of "expert committees", each combining inputs with learned weights to form progressively more abstract representations.
2. Core Operations
2.1 Linear Transformation
At each layer, inputs are combined using a weighted sum:
z=W⋅x+b
Where:
- W — weight matrix (learned parameters)
- x — input vector
- b — bias vector (baseline shift)
- z — pre-activation output (logit)
House price example (single neuron):
z=w1x1+w2x2+w3x3+b
z=(0.5)(1500)+(0.3)(3)+(−0.1)(10)+50=799.9
2.2 Activation Function
The linear output is passed through an activation function to introduce non-linearity:
a=f(z)
Using ReLU:
a=max(0,z)=max(0,799.9)=799.9
3. Multi-Layer Forward Pass
For a network with L layers:
z[l]=W[l]⋅a[l−1]+b[l]
a[l]=f(z[l])
Where a[0]=x (the raw input).
Two-layer example:
z[1]=W[1]x+b[1],a[1]=ReLU(z[1])
z[2]=W[2]a[1]+b[2],y^=z[2]
4. Architecture Diagram
Rendering diagram...
5. Step-by-Step Numeric Example
Setup:
- Input: x=[1500,3,10]
- Weights layer 1: W[1]=[[0.5,0.3,−0.1]]
- Bias layer 1: b[1]=50
Step 1 — Linear:
z[1]=0.5(1500)+0.3(3)+(−0.1)(10)+50=799.9
Step 2 — Activate:
a[1]=ReLU(799.9)=799.9
Step 3 — Output layer (scale down with learned weight w=1000):
y^=1000⋅a[1]=$799,000
6. Key Components Summary
| Component | Symbol | Role | Analogy |
|---|
| Input | x | Raw features | Raw ingredients |
| Weights | W | Feature importance | Recipe proportions |
| Bias | b | Baseline offset | Chef's personal touch |
| Activation | f(z) | Non-linearity | Quality filter |
| Output | y^ | Prediction | Final dish |
7. What Happens Next?
The forward pass gives us a prediction y^. But how do we know if it's good?
- The prediction (\hat{y} = \799{,}000)iscomparedtothe∗∗truevalue∗∗(y = $300{,}000$)
- The error must be quantified — that's the job of the Loss Function (Article 2)
- The weights must be improved — that's Backpropagation (Article 3)
Rendering diagram...
8. Quick Reference
z[l]=W[l]a[l−1]+b[l],a[l]=f(z[l])