LGDec 28, 2025

Merge before Forget: A Single LoRA Continual Learning via Continual Merging

arXiv:2512.23017v12 citationsh-index: 3
Originality Incremental advance
AI Analysis

This addresses the problem of efficient and interference-free adaptation for LLMs in continual learning scenarios, though it is incremental as it builds on existing LoRA techniques.

The paper tackles catastrophic forgetting in large language models during continual learning by proposing a method that orthogonally initializes and merges Low-Rank Adaptation (LoRA) updates into a single unified LoRA, achieving improved performance with constant memory complexity across tasks.

Parameter-efficient continual learning has emerged as a promising approach for large language models (LLMs) to mitigate catastrophic forgetting while enabling adaptation to new tasks. Current Low-Rank Adaptation (LoRA) continual learning techniques often retain and freeze previously learned LoRAs or generate data representations to overcome forgetting, typically utilizing these to support new LoRAs learn new tasks. However, these methods not only ignore growing computational memory with tasks and limited storage space but also suffer from potential task interference due to the lack of effective LoRA merging mechanisms. In this paper, we propose a novel continual learning method that orthogonally initializes and sequentially merges LoRAs updates into a single unified LoRA. Our method leverages orthogonal basis extraction from previously learned LoRA to initialize the learning of new tasks, further exploits the intrinsic asymmetry property of LoRA components by using a time-aware scaling mechanism to balance new and old knowledge during continual merging. Our approach maintains constant memory complexity with respect to the number of tasks, minimizes interference between past and new tasks via orthogonal basis initialization, and improves performance over asymmetric LoRA merging via adaptive scaling. We provide theoretical analysis to justify our design and conduct extensive experiments across diverse continual learning benchmarks using various Llama models, demonstrating the effectiveness and efficiency of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes