GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Quantize a 175B model to 3–4 bits in a few GPU-hours with a one-shot, Hessian-aware solver.

Frantar et al. · ICLR 2023 · Serving. Read the paper ↗

A free, interactive, animated visual explainer of GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers — every exhibit computed from the real formulas, with verbatim quotes from the source.

Questions

What is GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers?
Quantize a 175B model to 3–4 bits in a few GPU-hours with a one-shot, Hessian-aware solver.
Who published GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers, and where?
Frantar et al. — ICLR 2023 (arXiv:2210.17323).
Where can I find a visual explainer of GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers?
Right here — a free, interactive, animated walkthrough of the whole paper, with exhibits computed from the real formulas and verbatim quotes from the source.

Related explainers