CLOct 20, 2023

Zero-Shot Sharpness-Aware Quantization for Pre-trained Language Models

arXiv:2310.13315v1133 citationsh-index: 36
Originality Highly original
AI Analysis

This addresses the need for efficient model deployment in scenarios with privacy constraints, offering a novel zero-shot method for language models.

The paper tackles the problem of quantizing pre-trained language models without access to original training data, proposing a zero-shot sharpness-aware quantization framework that achieves up to a +6.98 average score improvement across 11 tasks.

Quantization is a promising approach for reducing memory overhead and accelerating inference, especially in large pre-trained language model (PLM) scenarios. While having no access to original training data due to security and privacy concerns has emerged the demand for zero-shot quantization. Most of the cutting-edge zero-shot quantization methods primarily 1) apply to computer vision tasks, and 2) neglect of overfitting problem in the generative adversarial learning process, leading to sub-optimal performance. Motivated by this, we propose a novel zero-shot sharpness-aware quantization (ZSAQ) framework for the zero-shot quantization of various PLMs. The key algorithm in solving ZSAQ is the SAM-SGA optimization, which aims to improve the quantization accuracy and model generalization via optimizing a minimax problem. We theoretically prove the convergence rate for the minimax optimization problem and this result can be applied to other nonconvex-PL minimax optimization frameworks. Extensive experiments on 11 tasks demonstrate that our method brings consistent and significant performance gains on both discriminative and generative PLMs, i.e., up to +6.98 average score. Furthermore, we empirically validate that our method can effectively improve the model generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes