LG AIDec 2, 2024

RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy

Geonho Lee, Janghwan Lee, Sukjin Hong, Minsoo Kim, Euijai Ahn, Du-Seong Chang, Jungwook Choi

arXiv:2412.01129v313.49 citationsh-index: 8Has CodeAAAI

Originality Incremental advance

AI Analysis

This work addresses the challenge of maintaining high accuracy in highly compressed 2-bit LLMs, which is crucial for efficient deployment in resource-constrained environments, though it is incremental as it builds on existing LoRA-based quantization error compensation methods.

The paper tackled the problem of low accuracy in 2-bit quantized large language models (LLMs) by proposing RILQ, a rank-insensitive LoRA-based quantization error compensation method, which improved accuracy across various quantizers and fine-tuning tasks, as demonstrated on LLaMA-2 and LLaMA-3 models.

Low-rank adaptation (LoRA) has become the dominant method for parameter-efficient LLM fine-tuning, with LoRA-based quantization error compensation (LQEC) emerging as a powerful tool for recovering accuracy in compressed LLMs. However, LQEC has underperformed in sub-4-bit scenarios, with no prior investigation into understanding this limitation. We propose RILQ (Rank-Insensitive LoRA-based Quantization Error Compensation) to understand fundamental limitation and boost 2-bit LLM accuracy. Based on rank analysis revealing model-wise activation discrepancy loss's rank-insensitive nature, RILQ employs this loss to adjust adapters cooperatively across layers, enabling robust error compensation with low-rank adapters. Evaluations on LLaMA-2 and LLaMA-3 demonstrate RILQ's consistent improvements in 2-bit quantized inference across various state-of-the-art quantizers and enhanced accuracy in task-specific fine-tuning. RILQ maintains computational efficiency comparable to existing LoRA methods, enabling adapter-merged weight-quantized LLM inference with significantly enhanced accuracy, making it a promising approach for boosting 2-bit LLM performance. Our code is available at https://github.com/aiha-lab/RILQ.

View on arXiv PDF Code

Similar