Pretraining Large Language Models with NVFP4
Pretrain in 4-bit floating point — micro-block scaling and Hadamard transforms match FP8 over 10T tokens.
NVIDIA · arXiv 2025 · Foundations. Read the paper ↗
A free, interactive, animated visual explainer of Pretraining Large Language Models with NVFP4 — every exhibit computed from the real formulas, with verbatim quotes from the source.
Questions
- What is Pretraining Large Language Models with NVFP4?
- Pretrain in 4-bit floating point — micro-block scaling and Hadamard transforms match FP8 over 10T tokens.
- Who published Pretraining Large Language Models with NVFP4, and where?
- NVIDIA — arXiv 2025 (arXiv:2509.25149).
- Where can I find a visual explainer of Pretraining Large Language Models with NVFP4?
- Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.
Related explainers
- Attention Is All You Need
- GPT-3: Language Models are Few-Shot Learners
- Mixtral of Experts
- Training Compute-Optimal Large Language Models
- Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- BERT: Pre-training of Deep Bidirectional Transformers
- Scaling Laws for Neural Language Models
- Adam: A Method for Stochastic Optimization