Fish Audio S2

Why one autoregressive stream is not enough — a slow Qwen3-4B semantic backbone, a fast depth-wise head, and an RVQ codec, without the N× flatten.

Fish Audio Team · 2026 · Speech / TTS. Read the paper ↗

A free, interactive, animated visual explainer of Fish Audio S2 — every exhibit computed from the real formulas, with verbatim quotes from the source.

Questions

What is Fish Audio S2?
Why one autoregressive stream is not enough — a slow Qwen3-4B semantic backbone, a fast depth-wise head, and an RVQ codec, without the N× flatten.
Who published Fish Audio S2, and where?
Fish Audio Team — 2026 (arXiv:2603.08823).
Where can I find a visual explainer of Fish Audio S2?
Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Related explainers