All posts
ai2025-11-01·2 min read

AI Backbone — Complete Study Guide

10 articles covering the fundamental concepts of modern AI — from forward passes to RLHF.

AI Backbone — Complete Study Guide

10 articles covering the fundamental concepts of modern AI


Reading Order

Rendering diagram...

Article Index

#TopicCore FormulaKey Concept
01Forward Passz=Wx+b\mathbf{z} = \mathbf{W}\mathbf{x} + \mathbf{b}Data flows left → right through layers
02Loss FunctionsL=(yy^)2L = (y - \hat{y})^2Measuring how wrong predictions are
03BackpropagationLw=Laazzw\frac{\partial L}{\partial w} = \frac{\partial L}{\partial a}\cdot\frac{\partial a}{\partial z}\cdot\frac{\partial z}{\partial w}Chain rule distributes blame
04Gradient DescentwwαLw \leftarrow w - \alpha\nabla LWalk downhill on the loss landscape
05Activation FunctionsReLU(z)=max(0,z)\text{ReLU}(z) = \max(0,z)Non-linearity enables complex learning
06Embeddingsei=E[i]\mathbf{e}_i = \mathbf{E}[i]Discrete tokens → dense meaning vectors
07Attention & Transformerssoftmax(QKT/dk)V\text{softmax}(\mathbf{QK}^T/\sqrt{d_k})\mathbf{V}Every token attends to every token
08RLHFrβDKL(πθπSFT)r - \beta D_{KL}(\pi_\theta\|\pi_{SFT})Align model with human values
09RegularizationL+λw2L + \lambda\sum w^2Generalize, don't memorize
10Tokenization1 token0.75 words1\text{ token} \approx 0.75\text{ words}Text → numbers the model can process

Quick-Glance Cheat Sheet

The Training Loop

xy^forward passL(y^,y)lossWLbackpropWWαLgradient descent\underbrace{x \to \hat{y}}_{\text{forward pass}} \to \underbrace{L(\hat{y}, y)}_{\text{loss}} \to \underbrace{\nabla_W L}_{\text{backprop}} \to \underbrace{W \leftarrow W - \alpha\nabla L}_{\text{gradient descent}}

Key Activations

FunctionFormulaUse
ReLUmax(0,z)\max(0,z)Hidden layers
Sigmoid11+ez\frac{1}{1+e^{-z}}Binary output
Softmaxeziezj\frac{e^{z_i}}{\sum e^{z_j}}Multi-class output
GELUzΦ(z)z\cdot\Phi(z)Transformers

Attention Formula

Attention(Q,K,V)=softmax ⁣(QKTdk)V\text{Attention}(Q,K,V) = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right)V

Regularization at a Glance

MethodPreventsHow
L2Large weights+λw2+\lambda\sum w^2
DropoutCo-dependencyRandom zeroing
Early StoppingOver-trainingMonitor val loss
Batch NormCovariate shiftNormalize activations

Generated from a full conversational deep-dive into AI foundations. Each article contains: intuition, real-life analogy, math derivations, examples, and diagrams.

Filed underai

Related posts