Continual Learning in Vision-Language Models via Aligned Model Merging
This addresses the problem of knowledge retention in continual learning for AI systems, but it is incremental as it builds on existing model merging techniques.
The paper tackles catastrophic forgetting in continual learning for vision-language models by proposing a model merging approach with aligned weights, which reduces forgetting and improves generalization compared to sequential fine-tuning.
Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors plasticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.