CLJun 3, 2024

Sparsity-Accelerated Training for Large Language Models

arXiv:2406.01392v228 citationsHas Code
AI Analysis

This addresses the problem of expensive training costs for practitioners using large language models, though it is incremental as it builds on existing sparsity and scheduling techniques.

The paper tackles the high computational costs of additional training for large language models by leveraging sparsity in pre-trained models to accelerate the process, achieving a 45% throughput improvement in continual pre-training and saving 38% training time in supervised fine-tuning.

Large language models (LLMs) have demonstrated proficiency across various natural language processing (NLP) tasks but often require additional training, such as continual pre-training and supervised fine-tuning. However, the costs associated with this, primarily due to their large parameter count, remain high. This paper proposes leveraging \emph{sparsity} in pre-trained LLMs to expedite this training process. By observing sparsity in activated neurons during forward iterations, we identify the potential for computational speed-ups by excluding inactive neurons. We address associated challenges by extending existing neuron importance evaluation metrics and introducing a ladder omission rate scheduler. Our experiments on Llama-2 demonstrate that Sparsity-Accelerated Training (SAT) achieves comparable or superior performance to standard training while significantly accelerating the process. Specifically, SAT achieves a $45\%$ throughput improvement in continual pre-training and saves $38\%$ training time in supervised fine-tuning in practice. It offers a simple, hardware-agnostic, and easily deployable framework for additional LLM training. Our code is available at https://github.com/OpenDFM/SAT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes