CosyVoice 2

Streaming and offline speech synthesis in one model — an FSQ semantic tokenizer, a Qwen2.5-0.5B text-speech LM, and a chunk-aware causal flow-matching mel decoder.

Du et al. · 2024 · Speech / TTS. Read the paper ↗

A free, interactive, animated visual explainer of CosyVoice 2 — every exhibit computed from the real formulas, with verbatim quotes from the source.

Questions

What is CosyVoice 2?
Streaming and offline speech synthesis in one model — an FSQ semantic tokenizer, a Qwen2.5-0.5B text-speech LM, and a chunk-aware causal flow-matching mel decoder.
Who published CosyVoice 2, and where?
Du et al. — 2024 (arXiv:2412.10117).
Where can I find a visual explainer of CosyVoice 2?
Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Related explainers