LGDec 11, 2023

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

arXiv:2312.06173v135 citationsh-index: 36Has Code
Originality Incremental advance
AI Analysis

This work addresses interference issues in multi-task model fusion, offering a scalable solution for constructing efficient multi-task models, though it is incremental as it builds on existing merging techniques like task arithmetic.

The paper tackles the problem of interference when merging task-specific models from a common pre-trained model, proposing a Concrete subspace learning method that identifies a shared low-dimensional subspace to reduce conflicts without significant performance loss, with experiments showing effectiveness in vision and language domains.

Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes