LG CVJan 30

Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA

Zhan Fa, Yue Duan, Jian Zhang, Lei Qi, Wanqi Yang, Yinghuan Shi

arXiv:2601.22828v12.71 citationsh-index: 28

Originality Highly original

AI Analysis

This work addresses the problem of efficient and effective continual learning for vision-language models, offering a computationally lightweight solution that avoids catastrophic forgetting without heavy inference burdens or external dependencies.

The paper tackles the challenge of catastrophic forgetting in vision-language continual learning by introducing a framework that restructures a single LoRA module into a decomposable Rank-1 Expert Pool, enabling dynamic, sparse task-specific updates. This approach achieves state-of-the-art results across multiple settings, reducing trainable parameters by 96.7% compared to baselines and eliminating reliance on external datasets or task-ID discriminators.

Continual learning (CL) in vision-language models (VLMs) faces significant challenges in improving task adaptation and avoiding catastrophic forgetting. Existing methods usually have heavy inference burden or rely on external knowledge, while Low-Rank Adaptation (LoRA) has shown potential in reducing these issues by enabling parameter-efficient tuning. However, considering directly using LoRA to alleviate the catastrophic forgetting problem is non-trivial, we introduce a novel framework that restructures a single LoRA module as a decomposable Rank-1 Expert Pool. Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [CLS] token. In addition, we propose an Activation-Guided Orthogonal (AGO) loss that orthogonalizes critical parts of LoRA weights across tasks. This sparse composition and orthogonalization enable fewer parameter updates, resulting in domain-aware learning while minimizing inter-task interference and maintaining downstream task performance. Extensive experiments across multiple settings demonstrate state-of-the-art results in all metrics, surpassing zero-shot upper bounds in generalization. Notably, it reduces trainable parameters by 96.7% compared to the baseline method, eliminating reliance on external datasets or task-ID discriminators. The merged LoRAs retain less weights and incur no inference latency, making our method computationally lightweight.

View on arXiv PDF

Similar