Muon is Scalable for LLM Training

An orthogonalizing optimizer that beats Adam on the matrix parameters — scaled to LLM training.

Liu et al. · arXiv 2025 · Foundations. Read the paper ↗

A free, interactive, animated visual explainer of Muon is Scalable for LLM Training — every exhibit computed from the real formulas, with verbatim quotes from the source.

Questions

What is Muon is Scalable for LLM Training?
An orthogonalizing optimizer that beats Adam on the matrix parameters — scaled to LLM training.
Who published Muon is Scalable for LLM Training, and where?
Liu et al. — arXiv 2025 (arXiv:2502.16982).
Where can I find a visual explainer of Muon is Scalable for LLM Training?
Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Related explainers