Replay-Free Continual Low-Rank Adaptation with Dynamic Memory
This work addresses the problem of efficiently adapting large pre-trained models to new tasks over time for AI practitioners, representing an incremental advance by combining low-rank adaptation with continual learning techniques.
The paper tackles catastrophic forgetting in continual learning for vision transformers by proposing DualLoRA, a parameter-efficient fine-tuning method that introduces orthogonal and residual low-rank adapters with dynamic memory, achieving significant improvements in accuracy, inference speed, and training efficiency across multiple benchmarks.
We revisit continual learning~(CL), which enables pre-trained vision transformers (ViTs) to sequentially fine-tune on new downstream tasks over time. However, as the scale of these models increases, catastrophic forgetting remains a more serious challenge. Recent studies highlight a crossover between CL techniques and parameter-efficient fine-tuning (PEFT), which focuses on fine-tuning only a small set of trainable parameters to adapt to downstream tasks, such as low-rank adaptation (LoRA). While LoRA achieves faster convergence and requires fewer trainable parameters, it has seldom been explored in the context of continual learning. To address this gap, we propose a novel PEFT-CL method called Dual Low-Rank Adaptation (DualLoRA), which introduces both an orthogonal LoRA adapter and a residual LoRA adapter parallel to pre-trained weights in each layer. These components are orchestrated by a dynamic memory mechanism to strike a balance between stability and plasticity. Additionally, we propose a scheme to predict task identity with confidence and calibrate the model's outputs accordingly. On ViT-based models, we demonstrate that DualLoRA offers significant advantages in accuracy, inference speed, and computation efficiency in training over existing CL methods across multiple benchmarks.