LGMay 18, 2025

AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections

Xin Yu, Yujia Wang, Jinghui Chen, Lingzhou Xue

arXiv:2505.12455v213.06 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses a memory efficiency problem for practitioners fine-tuning large models, but it is incremental as it builds on existing LoRA variants.

The paper tackles the sub-optimal performance of Low-Rank Adaptation (LoRA) in fine-tuning large language models by proposing AltLoRA, an alternating projection method that integrates momentum without increasing memory cost, achieving better results than LoRA variants and narrowing the gap to full fine-tuning.

Low-Rank Adaptation (LoRA) has emerged as an effective technique for reducing memory overhead in fine-tuning large language models. However, it often suffers from sub-optimal performance compared with full fine-tuning since the update is constrained in the low-rank space. Recent variants such as LoRA-Pro attempt to mitigate this by adjusting the gradients of the low-rank matrices to approximate the full gradient. However, LoRA-Pro's solution is not unique, and different solutions can lead to significantly varying performance in ablation studies. Besides, to incorporate momentum or adaptive optimization design, approaches like LoRA-Pro must first compute the equivalent gradient, causing a higher memory cost close to full fine-tuning. A key challenge remains in integrating momentum properly into the low-rank space with lower memory cost. In this work, we propose AltLoRA, an alternating projection method that avoids the difficulties in gradient approximation brought by the joint update design, meanwhile integrating momentum without higher memory complexity. Our theoretical analysis provides convergence guarantees and further shows that AltLoRA enables stable feature learning and robustness to transformation invariance. Extensive experiments across multiple tasks demonstrate that AltLoRA outperforms LoRA and its variants, narrowing the gap toward full fine-tuning while preserving superior memory efficiency.

View on arXiv PDF

Similar