ATM: Improving Model Merging by Alternating Tuning and Merging
This work addresses the need for efficient model merging in scenarios like federated learning, offering a lightweight refinement method, though it is incremental as it builds on existing task arithmetic approaches.
The paper tackled the problem of model merging as a cost-efficient alternative to multitask learning by proposing ATM, an iterative method that alternates tuning and merging, which improved performance across diverse vision tasks with demonstrated effectiveness.
Model merging has emerged as a cost-efficient approximation to multitask learning. Among merging strategies, task arithmetic is notable for its simplicity and effectiveness. In this work, we provide a theoretical motivation for task vectors by highlighting that, under single-epoch full-batch gradient descent, they are equivalent to multitask gradients. This insight leads us to reinterpret model merging as a single step in an iterative procedure that Alternates between Tuning and Merging (ATM). We propose two applications of ATM: (1) as an alternative to multitask learning in scenarios where data sharing is restricted (e.g., federated settings), and (2) as a lightweight refinement step to improve existing model merging methods using a small validation set. Experiments across diverse vision tasks demonstrate the effectiveness of ATM.