Deep learning foundations

The handful of papers the whole field stands on: the optimizer everyone uses, the trick that made networks deep, the attention mechanism, the scaling laws, and how to adapt a giant model cheaply.