Zero-Shot Quantization via Weight-Space Arithmetic

Daniele Solombrino, Antonio Andrea Gargiulo, Adrian Robert Minut, Luca Zhou, Alessandro Zirilli, Emanuele RodolÃ

arXiv:2604.0342044.8h-index: 2

AI Analysis

Provides a zero-shot, low-cost alternative to quantization-aware training for deploying Vision Transformers at extremely low bitwidths.

The paper shows that robustness to post-training quantization (PTQ) can be transferred between models via a quantization vector in weight space, improving PTQ robustness by up to 60% without quantization-aware training or receiver-side data.

We show that robustness to post-training quantization (PTQ) is a transferable direction in weight space. We call this direction the quantization vector: extracted from a donor task by simple weight-space arithmetic, it can be used to patch a receiver model and improve robustness to PTQ-induced noise by as much as 60%, without receiver-side quantization-aware training (QAT). Because the method requires no receiver training data, it provides a zero-shot, low-cost alternative to QAT for extremely low-bit deployment. We demonstrate this on Vision Transformer (ViT) models. More broadly, our results suggest that quantization robustness is not merely a byproduct of task-specific training, but a reusable feature of weight-space geometry that can be transferred rather than retrained.

View on arXiv PDF

Similar