Spark-TTS
One LM, one token stream, no acoustic model — a Qwen2.5-0.5B autoregresses a single flat stream of BiCodec tokens, and a GAN decoder turns them straight into 16 kHz audio.
Wang et al. · 2025 · Speech / TTS. Read the paper ↗
A free, interactive, animated visual explainer of Spark-TTS — every exhibit computed from the real formulas, with verbatim quotes from the source.
Questions
- What is Spark-TTS?
- One LM, one token stream, no acoustic model — a Qwen2.5-0.5B autoregresses a single flat stream of BiCodec tokens, and a GAN decoder turns them straight into 16 kHz audio.
- Who published Spark-TTS, and where?
- Wang et al. — 2025 (arXiv:2503.01710).
- Where can I find a visual explainer of Spark-TTS?
- Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.