The Forward Pass

1. Intuition

The forward pass is the process of feeding input data through a neural network layer by layer to produce a prediction. Think of it like an assembly line where raw materials (input features) pass through multiple workstations (layers), each transforming the data, until a final product (prediction) emerges.

Real-Life Example 🏠 — Predicting House Prices

Feature	Value
Size (sq ft)	1500
Bedrooms	3
Age (years)	10

A neural network processes these features through layers of "expert committees", each combining inputs with learned weights to form progressively more abstract representations.

2. Core Operations

2.1 Linear Transformation

At each layer, inputs are combined using a weighted sum:

$z = \mathbf{W} \cdot \mathbf{x} + \mathbf{b}$

Where:

$\mathbf{W}$ — weight matrix (learned parameters)
$\mathbf{x}$ — input vector
$\mathbf{b}$ — bias vector (baseline shift)
$z$ — pre-activation output (logit)

House price example (single neuron):

$z = w_1 x_1 + w_2 x_2 + w_3 x_3 + b$

$z = (0.5)(1500) + (0.3)(3) + (-0.1)(10) + 50 = 799.9$

2.2 Activation Function

The linear output is passed through an activation function to introduce non-linearity:

$a = f(z)$

Using ReLU:

$a = \max(0, z) = \max(0, 799.9) = 799.9$

3. Multi-Layer Forward Pass

For a network with $L$ layers:

$\mathbf{z}^{[l]} = \mathbf{W}^{[l]} \cdot \mathbf{a}^{[l-1]} + \mathbf{b}^{[l]}$

$\mathbf{a}^{[l]} = f\left(\mathbf{z}^{[l]}\right)$

Where $\mathbf{a}^{[0]} = \mathbf{x}$ (the raw input).

Two-layer example:

$\mathbf{z}^{[1]} = \mathbf{W}^{[1]}\mathbf{x} + \mathbf{b}^{[1]}, \quad \mathbf{a}^{[1]} = \text{ReLU}(\mathbf{z}^{[1]})$

$\mathbf{z}^{[2]} = \mathbf{W}^{[2]}\mathbf{a}^{[1]} + \mathbf{b}^{[2]}, \quad \hat{y} = \mathbf{z}^{[2]}$

4. Architecture Diagram

Rendering diagram...

5. Step-by-Step Numeric Example

Setup:

Input: $\mathbf{x} = [1500, 3, 10]$
Weights layer 1: $\mathbf{W}^{[1]} = [[0.5, 0.3, -0.1]]$
Bias layer 1: $b^{[1]} = 50$

Step 1 — Linear:

$z^{[1]} = 0.5(1500) + 0.3(3) + (-0.1)(10) + 50 = 799.9$

Step 2 — Activate:

$a^{[1]} = \text{ReLU}(799.9) = 799.9$

Step 3 — Output layer (scale down with learned weight $w=1000$ ):

$\hat{y} = 1000 \cdot a^{[1]} = \$799{,}000$

6. Key Components Summary

Component	Symbol	Role	Analogy
Input	$\mathbf{x}$	Raw features	Raw ingredients
Weights	$\mathbf{W}$	Feature importance	Recipe proportions
Bias	$\mathbf{b}$	Baseline offset	Chef's personal touch
Activation	$f(z)$	Non-linearity	Quality filter
Output	$\hat{y}$	Prediction	Final dish

7. What Happens Next?

The forward pass gives us a prediction $\hat{y}$ . But how do we know if it's good?

The prediction ( $\hat{y} = \$ 799{,}000 $) is compared to the **true value** ($ y = $300{,}000$)
The error must be quantified — that's the job of the Loss Function (Article 2)
The weights must be improved — that's Backpropagation (Article 3)

Rendering diagram...

8. Quick Reference

$\boxed{z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]}, \qquad a^{[l]} = f(z^{[l]})}$