CosyVoice 2
Streaming and offline speech synthesis in one model — an FSQ semantic tokenizer, a Qwen2.5-0.5B text-speech LM, and a chunk-aware causal flow-matching mel decoder.
Du et al. · 2024 · Speech / TTS. Read the paper ↗
A free, interactive, animated visual explainer of CosyVoice 2 — every exhibit computed from the real formulas, with verbatim quotes from the source.
Questions
- What is CosyVoice 2?
- Streaming and offline speech synthesis in one model — an FSQ semantic tokenizer, a Qwen2.5-0.5B text-speech LM, and a chunk-aware causal flow-matching mel decoder.
- Who published CosyVoice 2, and where?
- Du et al. — 2024 (arXiv:2412.10117).
- Where can I find a visual explainer of CosyVoice 2?
- Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.