LG OCOct 15, 2024

Subspace Optimization for Large Language Models with Convergence Guarantees

Yutong He, Pengrui Li, Yipeng Hu, Chuyan Chen, Kun Yuan

arXiv:2410.11289v224.838 citationsh-index: 5Has Code

Originality Incremental advance

AI Analysis

This addresses a critical issue for researchers and practitioners using memory-efficient optimization in LLMs, though it is incremental as it builds on existing subspace methods.

The paper tackled the problem of unclear convergence guarantees for subspace optimization algorithms like GaLore in large language models, revealing that GaLore does not always converge and introducing GoLore, a variant that provably converges in stochastic settings, with empirical validation.

Subspace optimization algorithms, such as GaLore (Zhao et al., 2024), have gained attention for pre-training and fine-tuning large language models (LLMs) due to their memory efficiency. However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we reveal that GaLore does not always converge to the optimal solution and provide an explicit counterexample to support this finding. We further explore the conditions under which GaLore achieves convergence, showing that it does so when either (i) a sufficiently large mini-batch size is used or (ii) the gradient noise is isotropic. More significantly, we introduce GoLore (Gradient random Low-rank projection), a novel variant of GaLore that provably converges in typical stochastic settings, even with standard batch sizes. Our convergence analysis extends naturally to other subspace optimization algorithms. Finally, we empirically validate our theoretical results and thoroughly test the proposed mechanisms. Codes are available at https://github.com/pkumelon/Golore.

View on arXiv PDF Code

Similar