Kokoro
TTS without any of this — an 82M non-autoregressive StyleTTS2 + ISTFTNet model turns phonemes and a split style vector into 24 kHz audio in one feed-forward pass; no LLM, no diffusion, no codec, no tokens.
hexgrad · 2025 · Speech / TTS. Read the paper ↗
A free, interactive, animated visual explainer of Kokoro — every exhibit computed from the real formulas, with verbatim quotes from the source.
Questions
- What is Kokoro?
- TTS without any of this — an 82M non-autoregressive StyleTTS2 + ISTFTNet model turns phonemes and a split style vector into 24 kHz audio in one feed-forward pass; no LLM, no diffusion, no codec, no tokens.
- Who published Kokoro, and where?
- hexgrad — 2025 (its official release).
- Where can I find a visual explainer of Kokoro?
- Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.