17 chapters · LLM internals

A field guide to large language models

Read deeply across the modern stack — architecture, pretraining, alignment, serving — then make it stick with theory, math, and an interactive coding workspace that reviews your solutions.

Start reading Jump into coding practice

17 chapters

Long-form theory

Hands-on math

Worked derivations

Interactive code

AI-reviewed solutions

What's inside

Reading, until it actually sticks

Three tools woven through every chapter — an AI reviewer for your code, a tutor that re-explains anything you skim, and quiet progress tracking so you always know where you left off.

Feature

AI code review

Submit a solution, get a focused critique — correctness, style, complexity — in seconds.

def softmax(x):

e = np.exp(x - x.max())

return e / e.sum()

All 5 tests passed

Numerically stable. Consider axis=-1 for batched inputs.

Feature

On-demand AI tutor

Stuck on a paragraph? One click and a tutor re-explains it with more intuition, more math, or a fresh angle.

Q · What is attention?

Attention weighs tokens by relevance.

Feature

Progress tracking

Chapters read, questions understood, problems solved — quietly tracked as you go.

Chapter 013/4

Tokenization
Attention
Positional encodings
Layer norm

The syllabus

Everything, in five parts

A guided path from the transformer block out to the research frontier. Every chapter pairs a deep read with practice and a first-class coding tab.

Part I

Foundations

3 chapters

Part II

Pretraining & Scale

4 chapters

Part III

Post-Training & Alignment

5 chapters

Part IV

Systems & Serving

2 chapters

Part V

Frontiers

3 chapters

Start at the beginning

Chapter 01 — Transformer Architecture Internals.

Open Chapter 01

A field guide to large language models

Reading, until it actually sticks

AI code review

On-demand AI tutor

Progress tracking

Everything, in five parts

Foundations

Transformer Architecture Internals

Tokenization & Embeddings

Attention Efficiency & Long Context

Pretraining & Scale

Pretraining Objectives & Scaling Laws

Optimization & Training Dynamics

Infrastructure, Distributed Training & Scaling

Mixture-Of-Experts

Post-Training & Alignment

SFT, Instruction Tuning, Data & PEFT

RLHF, RL & Preference Optimization (Core)

Alignment Algorithms Zoo

Reasoning & Test-Time Compute

Evaluation, Reward Hacking & Alignment Methodology

Systems & Serving

Inference & Serving

Agents, Tool Use & Product Post-Training

Frontiers

Diffusion & Non-Autoregressive Language Models

Multimodal / Vision-Language (Lighter)

Research Engineering & Debugging

Start at the beginning