LG AI CLFeb 19, 2024

EBFT: Effective and Block-Wise Fine-Tuning for Sparse LLMs

Song Guo, Fan Wu, Lei Zhang, Xiawu Zheng, Shengchuan Zhang, Fei Chao, Yiyu Shi, Rongrong Ji

arXiv:2402.12419v111.57 citationsh-index: 24Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently fine-tuning sparse LLMs for researchers and practitioners, offering a method that reduces computational costs and improves performance, though it is incremental as it builds on existing fine-tuning and sparsity techniques.

The paper tackles the problem of resource-intensive and suboptimal fine-tuning for sparse large language models (LLMs) by proposing EBFT, an efficient and fast framework that minimizes reconstruction error block-wise, achieving a perplexity of 16.88 on Wikitext2 with LlamaV1-7B at 70% sparsity, outperforming baselines like DSnoT (75.14) and LoRA (16.44), and completing fine-tuning in about 30 minutes on a single 16GB GPU.

Existing methods for fine-tuning sparse LLMs often suffer from resource-intensive requirements and high retraining costs. Additionally, many fine-tuning methods often rely on approximations or heuristic optimization strategies, which may lead to suboptimal solutions. To address these issues, we propose an efficient and fast framework for fine-tuning sparse LLMs based on minimizing reconstruction error. Our approach involves sampling a small dataset for calibration and utilizing backpropagation to iteratively optimize block-wise reconstruction error, on a block-by-block basis, aiming for optimal solutions. Extensive experiments on various benchmarks consistently demonstrate the superiority of our method over other baselines. For instance, on the Wikitext2 dataset with LlamaV1-7B at 70% sparsity, our proposed EBFT achieves a perplexity of 16.88, surpassing the state-of-the-art DSnoT with a perplexity of 75.14. Moreover, with a structured sparsity ratio of 26\%, EBFT achieves a perplexity of 16.27, outperforming LoRA (perplexity 16.44). Furthermore, the fine-tuning process of EBFT for LlamaV1-7B only takes approximately 30 minutes, and the entire framework can be executed on a single 16GB GPU. The source code is available at https://github.com/sunggo/EBFT.

View on arXiv PDF Code

Similar