LGSPOCOct 24, 2024

On the Crucial Role of Initialization for Matrix Factorization

ETH Zurich
arXiv:2410.18965v313 citationsh-index: 7ICLR
Originality Highly original
AI Analysis

This addresses convergence bottlenecks in nonconvex optimization for matrix factorization and finetuning of large AI models, representing a significant but incremental improvement over existing methods.

This work tackles the problem of slow convergence in low-rank matrix factorization by introducing Nystrom initialization, which improves Scaled Gradient Descent to achieve quadratic convergence rates where only linear rates were previously known, and extends this to low-rank adapters (LoRA) for foundation models, demonstrating superior performance on tasks with models up to 7B parameters.

This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization. We introduce Nystrom initialization, which significantly improves the global convergence of Scaled Gradient Descent (ScaledGD) in both symmetric and asymmetric matrix factorization tasks. Specifically, we prove that ScaledGD with Nystrom initialization achieves quadratic convergence in cases where only linear rates were previously known. Furthermore, we extend this initialization to low-rank adapters (LoRA) commonly used for finetuning foundation models. Our approach, NoRA, i.e., LoRA with Nystrom initialization, demonstrates superior performance across various downstream tasks and model scales, from 1B to 7B parameters, in large language and diffusion models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes