Who published Spark-TTS, and where?

Wang et al. — 2025 (arXiv:2503.01710).

Where can I find a visual explainer of Spark-TTS?

Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Spark-TTS

One LM, one token stream, no acoustic model — a Qwen2.5-0.5B autoregresses a single flat stream of BiCodec tokens, and a GAN decoder turns them straight into 16 kHz audio.

Wang et al. · 2025 · Speech / TTS. Read the paper ↗

A free, interactive, animated visual explainer of Spark-TTS — every exhibit computed from the real formulas, with verbatim quotes from the source.

Questions

What is Spark-TTS?: One LM, one token stream, no acoustic model — a Qwen2.5-0.5B autoregresses a single flat stream of BiCodec tokens, and a GAN decoder turns them straight into 16 kHz audio.
Who published Spark-TTS, and where?: Wang et al. — 2025 (arXiv:2503.01710).
Where can I find a visual explainer of Spark-TTS?: Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Related explainers

Learning Transferable Visual Models From Natural Language Supervision
Orpheus TTS
Fish Audio S2
IndexTTS2
CosyVoice 2
Higgs Audio v2
Chatterbox
Kokoro