LGNov 30, 2022

Quadapter: Adapter for GPT-2 Quantization

arXiv:2211.16912v1299 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the issue of quantization overfitting in pretrained language models like GPT-2, which is incremental as it builds on existing quantization-aware training methods.

The paper tackles the problem of quantizing GPT-2 without overfitting by introducing Quadapter, a small set of parameters that scale activations channel-wise to make them quantization-friendly, resulting in improved quantization performance.

Transformer language models such as GPT-2 are difficult to quantize because of outliers in activations leading to a large quantization error. To adapt to the error, one must use quantization-aware training, which entails a fine-tuning process based on the dataset and the training pipeline identical to those for the original model. Pretrained language models, however, often do not grant access to their datasets and training pipelines, forcing us to rely on arbitrary ones for fine-tuning. In that case, it is observed that quantization-aware training overfits the model to the fine-tuning data. For quantization without overfitting, we introduce a quantization adapter (Quadapter), a small set of parameters that are learned to make activations quantization-friendly by scaling them channel-wise. It keeps the model parameters unchanged. By applying our method to the challenging task of quantizing GPT-2, we demonstrate that it effectively prevents the overfitting and improves the quantization performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes