Who published Muon is Scalable for LLM Training, and where?

Liu et al. — arXiv 2025 (arXiv:2502.16982).

Where can I find a visual explainer of Muon is Scalable for LLM Training?

Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Muon is Scalable for LLM Training

An orthogonalizing optimizer that beats Adam on the matrix parameters — scaled to LLM training.

Liu et al. · arXiv 2025 · Foundations. Read the paper ↗

A free, interactive, animated visual explainer of Muon is Scalable for LLM Training — every exhibit computed from the real formulas, with verbatim quotes from the source.

Questions

What is Muon is Scalable for LLM Training?: An orthogonalizing optimizer that beats Adam on the matrix parameters — scaled to LLM training.
Who published Muon is Scalable for LLM Training, and where?: Liu et al. — arXiv 2025 (arXiv:2502.16982).
Where can I find a visual explainer of Muon is Scalable for LLM Training?: Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Related explainers

Attention Is All You Need
GPT-3: Language Models are Few-Shot Learners
Mixtral of Experts
Training Compute-Optimal Large Language Models
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
BERT: Pre-training of Deep Bidirectional Transformers
Scaling Laws for Neural Language Models
Adam: A Method for Stochastic Optimization