LGJul 16, 2024

Exploring Quantization for Efficient Pre-Training of Transformer Language Models

arXiv:2407.11722v227 citationsh-index: 21Has Code
Originality Synthesis-oriented
AI Analysis

This addresses the efficiency problem for researchers and practitioners training large language models, but it is incremental as it applies known quantization techniques to a new stage (pre-training).

This study tackles the problem of high computational requirements for pre-training Transformer language models by exploring quantization during pre-training, finding that applying straightforward linear quantization to weights, activations, gradients, and optimizer states can promote high training efficiency while retaining language modeling ability.

The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in Transformers during pre-training has remained largely unexplored at scale for language modeling. This study aims to explore the impact of quantization for efficient pre-training of Transformers, with a focus on linear layer components. By systematically applying straightforward linear quantization to weights, activations, gradients, and optimizer states, we assess its effects on model efficiency, stability, and performance during training. By offering a comprehensive recipe of effective quantization strategies to be applied during the pre-training of Transformers, we promote high training efficiency from scratch while retaining language modeling ability. Code is available at https://github.com/chandar-lab/EfficientLLMs.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes