All posts
ai2025-11-08·3 min read

The Forward Pass

The journey of data through a neural network — and what actually happens at each layer.

1. Intuition

The forward pass is the process of feeding input data through a neural network layer by layer to produce a prediction. Think of it like an assembly line where raw materials (input features) pass through multiple workstations (layers), each transforming the data, until a final product (prediction) emerges.

Real-Life Example 🏠 — Predicting House Prices

FeatureValue
Size (sq ft)1500
Bedrooms3
Age (years)10

A neural network processes these features through layers of "expert committees", each combining inputs with learned weights to form progressively more abstract representations.


2. Core Operations

2.1 Linear Transformation

At each layer, inputs are combined using a weighted sum:

z=Wx+bz = \mathbf{W} \cdot \mathbf{x} + \mathbf{b}

Where:

  • W\mathbf{W} — weight matrix (learned parameters)
  • x\mathbf{x} — input vector
  • b\mathbf{b} — bias vector (baseline shift)
  • zz — pre-activation output (logit)

House price example (single neuron):

z=w1x1+w2x2+w3x3+bz = w_1 x_1 + w_2 x_2 + w_3 x_3 + b

z=(0.5)(1500)+(0.3)(3)+(0.1)(10)+50=799.9z = (0.5)(1500) + (0.3)(3) + (-0.1)(10) + 50 = 799.9

2.2 Activation Function

The linear output is passed through an activation function to introduce non-linearity:

a=f(z)a = f(z)

Using ReLU:

a=max(0,z)=max(0,799.9)=799.9a = \max(0, z) = \max(0, 799.9) = 799.9


3. Multi-Layer Forward Pass

For a network with LL layers:

z[l]=W[l]a[l1]+b[l]\mathbf{z}^{[l]} = \mathbf{W}^{[l]} \cdot \mathbf{a}^{[l-1]} + \mathbf{b}^{[l]}

a[l]=f(z[l])\mathbf{a}^{[l]} = f\left(\mathbf{z}^{[l]}\right)

Where a[0]=x\mathbf{a}^{[0]} = \mathbf{x} (the raw input).

Two-layer example:

z[1]=W[1]x+b[1],a[1]=ReLU(z[1])\mathbf{z}^{[1]} = \mathbf{W}^{[1]}\mathbf{x} + \mathbf{b}^{[1]}, \quad \mathbf{a}^{[1]} = \text{ReLU}(\mathbf{z}^{[1]})

z[2]=W[2]a[1]+b[2],y^=z[2]\mathbf{z}^{[2]} = \mathbf{W}^{[2]}\mathbf{a}^{[1]} + \mathbf{b}^{[2]}, \quad \hat{y} = \mathbf{z}^{[2]}


4. Architecture Diagram

Rendering diagram...

5. Step-by-Step Numeric Example

Setup:

  • Input: x=[1500,3,10]\mathbf{x} = [1500, 3, 10]
  • Weights layer 1: W[1]=[[0.5,0.3,0.1]]\mathbf{W}^{[1]} = [[0.5, 0.3, -0.1]]
  • Bias layer 1: b[1]=50b^{[1]} = 50

Step 1 — Linear:

z[1]=0.5(1500)+0.3(3)+(0.1)(10)+50=799.9z^{[1]} = 0.5(1500) + 0.3(3) + (-0.1)(10) + 50 = 799.9

Step 2 — Activate:

a[1]=ReLU(799.9)=799.9a^{[1]} = \text{ReLU}(799.9) = 799.9

Step 3 — Output layer (scale down with learned weight w=1000w=1000):

y^=1000a[1]=$799,000\hat{y} = 1000 \cdot a^{[1]} = \$799{,}000


6. Key Components Summary

ComponentSymbolRoleAnalogy
Inputx\mathbf{x}Raw featuresRaw ingredients
WeightsW\mathbf{W}Feature importanceRecipe proportions
Biasb\mathbf{b}Baseline offsetChef's personal touch
Activationf(z)f(z)Non-linearityQuality filter
Outputy^\hat{y}PredictionFinal dish

7. What Happens Next?

The forward pass gives us a prediction y^\hat{y}. But how do we know if it's good?

  • The prediction (\hat{y} = \799{,}000)iscomparedtothetruevalue() is compared to the **true value** (y = $300{,}000$)
  • The error must be quantified — that's the job of the Loss Function (Article 2)
  • The weights must be improved — that's Backpropagation (Article 3)
Rendering diagram...

8. Quick Reference

z[l]=W[l]a[l1]+b[l],a[l]=f(z[l])\boxed{z^{[l]} = W^{[l]} a^{[l-1]} + b^{[l]}, \qquad a^{[l]} = f(z^{[l]})}

Filed underai

Related posts