Less is More: Resource-Efficient Low-Rank Adaptation
This work addresses efficiency issues in parameter-efficient fine-tuning for researchers and practitioners using large language, multimodal, and diffusion models, representing an incremental improvement over existing methods.
The paper tackles the problem of high training costs and parameter interference in Low-Rank Adaptation (LoRA) for fine-tuning large models by proposing EffiLoRA, which uses a unified A matrix and selective B matrix updates to reduce resource usage while improving performance across tasks like commonsense reasoning and image generation.
Low-Rank Adaptation (LoRA) is a widely adopted parameter-efficient fine-tuning (PEFT) method for Large Language Models (LLMs), but it still incurs notable overhead and suffers from parameter interference in complex datasets. While re- cent works decouple LoRA update matrices to exploit matrix-wise asymmetry, training costs remain high. We revisit LoRA from the perspective of inter-matrix and intra-layer parameter redundancy and propose Resource-Efficient Low-Rank Adaptation, EffiLoRA, a lightweight and generalizable approach for language, multimodal, and diffusion models. EffiLoRA employs a unified A matrix across all transformer layers and introduces a runtime selective B matrices up- date to dynamically trade-off the system resource budget and model performance. EffiLoRA consistently outperforms LoRA across diverse modalities, including commonsense reasoning, visual instruction tuning, and image generation, demon- strating improved efficiency and robustness.