Dates
9 February 2026
4:00pm ET
Transformers
Speaker
Helen Qu, Flatiron
Build a decoder-only transformer (a small GPT-like language model) from scratch in PyTorch. Train it on the Tiny Shakespeare dataset for character-level language modeling and use it to generate text, understanding every component along the way.
Topics Covered:
- Self-attention as a learned, data-dependent mixing operator
- Causal (masked) self-attention for autoregressive modeling
- Building a GPT-style Transformer block from scratch
- Token and positional embeddings
- Training a small autoregressive language model
- Text generation with temperature and top-k sampling



