Who published Pretraining Large Language Models with NVFP4, and where?

NVIDIA — arXiv 2025 (arXiv:2509.25149).

Pretraining Large Language Models with NVFP4

Pretrain in 4-bit floating point — micro-block scaling and Hadamard transforms match FP8 over 10T tokens.

NVIDIA · arXiv 2025 · Foundations. Read the paper ↗

A free, interactive, animated visual explainer of Pretraining Large Language Models with NVFP4 — every exhibit computed from the real formulas, with verbatim quotes from the source.

Questions

What is Pretraining Large Language Models with NVFP4?: Pretrain in 4-bit floating point — micro-block scaling and Hadamard transforms match FP8 over 10T tokens.
Who published Pretraining Large Language Models with NVFP4, and where?: NVIDIA — arXiv 2025 (arXiv:2509.25149).
Where can I find a visual explainer of Pretraining Large Language Models with NVFP4?: Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Related explainers

Attention Is All You Need
GPT-3: Language Models are Few-Shot Learners
Mixtral of Experts
Training Compute-Optimal Large Language Models
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
BERT: Pre-training of Deep Bidirectional Transformers
Scaling Laws for Neural Language Models
Adam: A Method for Stochastic Optimization