How AI Learned to Speak

A from-zero ramp through how modern text-to-speech actually works: a sound wave as numbers, the neural codec that turns audio into tokens, TTS as next-token prediction, all the way to today’s talking LLMs — built on the verified explainers behind it.

47 min · 27 chapters · Watch on YouTube ↗

Chapters

Built from these explainers