LGCLJan 8, 2024

Chain of LoRA: Efficient Fine-tuning of Language Models via Residual Learning

Princeton
arXiv:2401.04151v195 citationsh-index: 64
Originality Incremental advance
AI Analysis

This addresses the problem of efficient fine-tuning for large language models, offering a parameter-efficient method that bridges performance gaps, though it is incremental as it builds on existing LoRA techniques.

The paper tackles the generalization gap between low-rank adaptation (LoRA) and full-parameter fine-tuning in language models by introducing Chain of LoRA (COLA), an iterative framework that merges learned LoRA modules and re-initializes optimization, achieving consistent performance improvements across models and tasks without extra costs.

Fine-tuning is the primary methodology for tailoring pre-trained large language models to specific tasks. As the model's scale and the diversity of tasks expand, parameter-efficient fine-tuning methods are of paramount importance. One of the most widely used family of methods is low-rank adaptation (LoRA) and its variants. LoRA encodes weight update as the product of two low-rank matrices. Despite its advantages, LoRA falls short of full-parameter fine-tuning in terms of generalization error for certain tasks. We introduce Chain of LoRA (COLA), an iterative optimization framework inspired by the Frank-Wolfe algorithm, to bridge the gap between LoRA and full parameter fine-tuning, without incurring additional computational costs or memory overheads. COLA employs a residual learning procedure where it merges learned LoRA modules into the pre-trained language model parameters and re-initilize optimization for new born LoRA modules. We provide theoretical convergence guarantees as well as empirical results to validate the effectiveness of our algorithm. Across various models (OPT and llama-2) and seven benchmarking tasks, we demonstrate that COLA can consistently outperform LoRA without additional computational or memory costs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes