LGAICVNov 24, 2025

Merging without Forgetting: Continual Fusion of Task-Specific Models via Optimal Transport

arXiv:2511.19561v11 citations
Originality Highly original
AI Analysis

This work addresses the problem of building versatile, efficient multi-task systems for AI practitioners, offering a novel method for model merging with incremental capabilities.

The paper tackled the problem of merging models fine-tuned for different tasks by addressing distribution shifts from naive parameter interpolation, proposing OTMF based on optimal transport theory, which achieved state-of-the-art performance in accuracy and efficiency on vision and language benchmarks.

Merging models fine-tuned for different tasks into a single unified model has become an increasingly important direction for building versatile, efficient multi-task systems. Existing approaches predominantly rely on parameter interpolation in weight space, which we show introduces significant distribution shift in the feature space and undermines task-specific knowledge. In this paper, we propose OTMF (Optimal Transport-based Masked Fusion), a novel model merging framework rooted in optimal transport theory to address the distribution shift that arises from naive parameter interpolation. Instead of directly aggregating features or weights, OTMF aligns the semantic geometry of task-specific models by discovering common masks applied to task vectors through optimal transport plans. These masks selectively extract transferable and task-agnostic components while preserving the unique structural identities of each task. To ensure scalability in real-world settings, OTMF further supports a continual fusion paradigm that incrementally integrates each new task vector without revisiting previous ones, maintaining a bounded memory footprint and enabling efficient fusion across a growing number of tasks. We conduct comprehensive experiments on multiple vision and language benchmarks, and results show that OTMF achieves state-of-the-art performance in terms of both accuracy and efficiency. These findings highlight the practical and theoretical value of our approach to model merging.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes