CVAILGMay 30, 2025

Continual Learning in Vision-Language Models via Aligned Model Merging

arXiv:2506.03189v17 citationsh-index: 27
Originality Incremental advance
AI Analysis

This addresses the problem of knowledge retention in continual learning for AI systems, but it is incremental as it builds on existing model merging techniques.

The paper tackles catastrophic forgetting in continual learning for vision-language models by proposing a model merging approach with aligned weights, which reduces forgetting and improves generalization compared to sequential fine-tuning.

Continual learning is conventionally tackled through sequential fine-tuning, a process that, while enabling adaptation, inherently favors plasticity over the stability needed to retain prior knowledge. While existing approaches attempt to mitigate catastrophic forgetting, a bias towards recent tasks persists as they build upon this sequential nature. In this work we present a new perspective based on model merging to maintain stability while still retaining plasticity. Rather than just sequentially updating the model weights, we propose merging newly trained task parameters with previously learned ones, promoting a better balance. To maximize the effectiveness of the merging process, we propose a simple mechanism that promotes learning aligned weights with previous ones, thereby avoiding interference when merging. We evaluate this approach on large Vision-Language Models (VLMs), and demonstrate its effectiveness in reducing forgetting, increasing robustness to various task orders and similarities, and improving generalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes