Post-training LLMs

Start with a pretrained model that only predicts text. End understanding every lever that turns it into an aligned, reasoning assistant — RLHF, DPO, the GRPO family, and the test-time-compute frontier.