CLAIFeb 19, 2024

Is It a Free Lunch for Removing Outliers during Pretraining?

arXiv:2402.12102v12 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in quantization for large language models, offering an incremental improvement for efficient deployment.

The paper tackles the problem of outliers in large language models affecting quantization performance by enhancing a softmax-based method to be invariant to sequence length, which improves quantization suitability without degrading full-precision performance and enables pretraining of causal language models.

With the growing size of large language models, the role of quantization becomes increasingly significant. However, outliers present in weights or activations notably influence the performance of quantized models. Recently, \citet{qtransformer} introduced a novel softmax function aimed at pretraining models in an outlier-free manner, thereby enhancing their suitability for quantization. Interestingly, we observed that such an approach leads to performance degradation in full precision. Building on this insight, we enhance the method by ensuring its normalization is invariant to sequence length, a crucial factor for bridging the gap between pretraining and fine-tuning. Moreover, this improved method also facilitates successful pretraining of causal language models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes