LGOCOct 15, 2024

Subspace Optimization for Large Language Models with Convergence Guarantees

arXiv:2410.11289v238 citationsh-index: 5Has Code
AI Analysis

This addresses a critical issue for researchers and practitioners using memory-efficient optimization in LLMs, though it is incremental as it builds on existing subspace methods.

The paper tackled the problem of unclear convergence guarantees for subspace optimization algorithms like GaLore in large language models, revealing that GaLore does not always converge and introducing GoLore, a variant that provably converges in stochastic settings, with empirical validation.

Subspace optimization algorithms, such as GaLore (Zhao et al., 2024), have gained attention for pre-training and fine-tuning large language models (LLMs) due to their memory efficiency. However, their convergence guarantees remain unclear, particularly in stochastic settings. In this paper, we reveal that GaLore does not always converge to the optimal solution and provide an explicit counterexample to support this finding. We further explore the conditions under which GaLore achieves convergence, showing that it does so when either (i) a sufficiently large mini-batch size is used or (ii) the gradient noise is isotropic. More significantly, we introduce GoLore (Gradient random Low-rank projection), a novel variant of GaLore that provably converges in typical stochastic settings, even with standard batch sizes. Our convergence analysis extends naturally to other subspace optimization algorithms. Finally, we empirically validate our theoretical results and thoroughly test the proposed mechanisms. Codes are available at https://github.com/pkumelon/Golore.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes