LG AIMar 9

FedMomentum: Preserving LoRA Training Momentum in Federated Fine-Tuning

Peishen Yan, Yang Hua, Hao Wang, Jiaru Zhang, Xiaoyu Wu, Tao Song, Haibing Guan

arXiv:2603.08014v117.8h-index: 12

Predicted impact top 7% in LG · last 90 daysOriginality Highly original

AI Analysis

This work provides a solution for improving the efficiency and performance of federated fine-tuning of large language models using LoRA, which is beneficial for organizations and individuals needing to adapt LLMs to specific tasks while maintaining data privacy.

This paper addresses the issue of "loss of training momentum" in federated fine-tuning of LLMs using LoRA, where naive aggregation of LoRA modules introduces noise and existing noise-free strategies compromise structural expressiveness. The authors propose FedMomentum, a novel framework that uses SVD to extract dominant components from aggregated low-rank updates, enabling structured and momentum-preserving LoRA aggregation. FedMomentum consistently outperforms prior state-of-the-art methods in convergence speed and final accuracy across multiple tasks.

Federated fine-tuning of large language models (LLMs) with low-rank adaptation (LoRA) offers a communication-efficient and privacy-preserving solution for task-specific adaptation. Naive aggregation of LoRA modules introduces noise due to mathematical incorrectness when averaging the downsampling and upsampling matrices independently. However, existing noise-free aggregation strategies inevitably compromise the structural expressiveness of LoRA, limiting its ability to retain client-specific adaptations by either improperly reconstructing the low-rank structure or excluding partially trainable components. We identify this problem as loss of training momentum, where LoRA updates fail to accumulate effectively across rounds, resulting in slower convergence and suboptimal performance. To address this, we propose FedMomentum, a novel framework that enables structured and momentum-preserving LoRA aggregation via singular value decomposition (SVD). Specifically, after aggregating low-rank updates in a mathematically correct manner, FedMomentum applies SVD to extract the dominant components that capture the main update directions. These components are used to reconstruct the LoRA modules with the same rank, while residual components can be retained and later merged into the backbone to preserve semantic information and ensure robustness. Extensive experiments across multiple tasks demonstrate that FedMomentum consistently outperforms prior state-of-the-art methods in convergence speed and final accuracy.

View on arXiv PDF

Similar