LG AR CL PFFeb 12, 2025

LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits

Zikai Zhou, Qizheng Zhang, Hermann Kumbong, Kunle Olukotun

arXiv:2502.08141v114.47 citationsh-index: 69ICML

Originality Highly original

AI Analysis

This work is significant for developers and users of large language models in resource-constrained environments, providing an incremental yet important improvement in efficient fine-tuning methods.

The authors tackled the problem of fine-tuning large language models, achieving a performance-precision trade-off with minimal loss, and reducing memory usage by up to 50% with their LowRA framework, which enables LoRA fine-tuning below 2 bits per parameter. LowRA remains accurate down to 1.15 bits.

Fine-tuning large language models (LLMs) is increasingly costly as models scale to hundreds of billions of parameters, and even parameter-efficient fine-tuning (PEFT) methods like LoRA remain resource-intensive. We introduce LowRA, the first framework to enable LoRA fine-tuning below 2 bits per parameter with minimal performance loss. LowRA optimizes fine-grained quantization - mapping, threshold selection, and precision assignment - while leveraging efficient CUDA kernels for scalable deployment. Extensive evaluations across 4 LLMs and 4 datasets show that LowRA achieves a superior performance-precision trade-off above 2 bits and remains accurate down to 1.15 bits, reducing memory usage by up to 50%. Our results highlight the potential of ultra-low-bit LoRA fine-tuning for resource-constrained environments.

View on arXiv PDF

Similar