Subspace-Boosted Model Merging
This addresses the challenge of efficiently combining many expert models for multi-task learning, though it is incremental as it builds on existing merging methods.
The paper tackles the problem of diminishing returns in merging multiple specialized expert models by identifying rank collapse in the task vector space and introduces Subspace Boosting to maintain ranks, achieving over 10% performance gains for up to 20 models on vision and language benchmarks.
Model merging enables the combination of multiple specialized expert models into a single model capable of performing multiple tasks. However, the benefits of merging an increasing amount of specialized experts generally lead to diminishing returns and reduced overall performance gains. In this work, we offer an explanation and analysis from a task arithmetic perspective; revealing that as the merging process (across numerous existing merging methods) continues for more and more experts, the associated task vector space experiences rank collapse. To mitigate this issue, we introduce Subspace Boosting, which operates on the singular value decomposed task vector space and maintains task vector ranks. Subspace Boosting raises merging efficacy for up to 20 expert models by large margins of more than 10% when evaluated on both vision and language benchmarks. Moreover, we propose employing Higher-Order Generalized Singular Value Decomposition to quantify task similarity, offering a new interpretable perspective on model merging.