LG NA OCFeb 4, 2024

Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

arXiv:2402.02347v326.549 citationsh-index: 25Has CodeICML

Originality Incremental advance

AI Analysis

This work addresses the challenge of unstable and inefficient fine-tuning for users of large foundation models, though it is incremental as it builds on the existing LoRA method.

The paper tackles the problem of enhancing Low-Rank Adaptation (LoRA) fine-tuning for foundation models by introducing a preconditioner, resulting in significantly improved convergence, reliability, and robustness to hyperparameters like learning rate in experiments with large language and diffusion models.

Low-Rank Adaptation (LoRA) emerges as a popular parameter-efficient fine-tuning (PEFT) method, which proposes to freeze pretrained model weights and update an additive low-rank trainable matrix. In this work, we study the enhancement of LoRA training by introducing an $r \times r$ preconditioner in each gradient step where $r$ is the LoRA rank. We theoretically verify that the proposed preconditioner stabilizes feature learning with LoRA under infinite-width NN setting. Empirically, the implementation of this new preconditioner requires a small change to existing optimizer code and creates virtually minuscule storage and runtime overhead. Our experimental results with both large language models and text-to-image diffusion models show that with this new preconditioner, the convergence and reliability of SGD and AdamW can be significantly enhanced. Moreover, the training process becomes much more robust to hyperparameter choices such as learning rate. The new preconditioner can be derived from a novel Riemannian metric in low-rank matrix field. Code can be accessed at https://github.com/pilancilab/Riemannian_Preconditioned_LoRA.

View on arXiv PDF Code

Similar