Fish Audio S2
Why one autoregressive stream is not enough — a slow Qwen3-4B semantic backbone, a fast depth-wise head, and an RVQ codec, without the N× flatten.
Fish Audio Team · 2026 · Speech / TTS. Read the paper ↗
A free, interactive, animated visual explainer of Fish Audio S2 — every exhibit computed from the real formulas, with verbatim quotes from the source.
Questions
- What is Fish Audio S2?
- Why one autoregressive stream is not enough — a slow Qwen3-4B semantic backbone, a fast depth-wise head, and an RVQ codec, without the N× flatten.
- Who published Fish Audio S2, and where?
- Fish Audio Team — 2026 (arXiv:2603.08823).
- Where can I find a visual explainer of Fish Audio S2?
- Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.