LG IT OCJun 2, 2025

MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation

Wei Shen, Zhang Yaxiang, Minhui Huang, Mengfan Xu, Jiawei Zhang, Cong Shen

arXiv:2506.01897v39.42 citationsh-index: 2

Originality Highly original

AI Analysis

This addresses the problem of memory constraints in adapting large language models for researchers and practitioners, offering an incremental improvement over existing compression methods.

The paper tackles the high memory demands of full-parameter fine-tuning for large language models by proposing MLorc, a memory-efficient training paradigm that compresses and reconstructs momentum during training. The result is that MLorc consistently outperforms other memory-efficient methods, matches or exceeds full fine-tuning performance at small ranks like r=4, and generalizes across optimizers without compromising time or memory efficiency.

With increasing size of large language models (LLMs), full-parameter fine-tuning imposes substantial memory demands. To alleviate this, we propose a novel memory-efficient training paradigm called Momentum Low-rank compression (MLorc). The key idea of MLorc is to compress and reconstruct the momentum of matrix parameters during training to reduce memory consumption. Compared to LoRA, MLorc avoids enforcing a fixed-rank constraint on weight update matrices and thus enables full-parameter learning. Compared to GaLore, MLorc directly compress the momentum rather than gradients, thereby better preserving the training dynamics of full-parameter fine-tuning. We provide a theoretical guarantee for its convergence under mild assumptions. Empirically, MLorc consistently outperforms other memory-efficient training methods, matches or even exceeds the performance of full fine-tuning at small ranks (e.g., $r=4$), and generalizes well across different optimizers -- all while not compromising time or memory efficiency.

View on arXiv PDF

Similar