Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
This work addresses parameter efficiency for researchers and practitioners fine-tuning LLMs on multiple tasks or in federated settings, but it is incremental as it builds on existing LoRA methods.
The paper tackled the problem of inefficient parameter sharing in multi-task fine-tuning of large language models using Low-Rank Adaptation (LoRA), proposing ALoRA and Fed-ALoRA to share the B matrix across tasks or clients, which achieved more balanced performance with comparable or superior average accuracy on reasoning and NLP datasets.
Large language models are often adapted using parameter-efficient techniques such as Low-Rank Adaptation (LoRA), formulated as $y = W_0x + BAx$, where $W_0$ is the pre-trained parameters and $x$ is the input to the adapted layer. While multi-adapter extensions often employ multiple LoRAs, prior studies suggest that the inner $A$ matrices are highly similar during training and thus suitable for sharing. We revisit this phenomenon and find that this similarity is largely attributable to the identical initialization rather than shared knowledge, with $B$ playing a more critical role in knowledge encoding and transfer. Motivated by these insights, we propose \textbf{ALoRA}, an asymmetric multi-LoRA design with multiple $A$ matrices and a single shared $B$ in multi-task fine-tuning, and \textbf{Fed-ALoRA}, which shares $B$ across clients in federated fine-tuning under both homogeneous and heterogeneous settings, through a novel matrix decomposition strategy to accommodate heterogeneous ranks across clients. Experiments on commonsense reasoning, math reasoning, multi-task NLP dataset, and federated NLP dataset demonstrate that our methods achieve more balanced performance across tasks with comparable or superior average accuracy relative to existing multi-LoRA approaches. Codes are available at https://github.com/OptMN-Lab/ALoRA.