AI CLJun 22, 2025

SE-Merging: A Self-Enhanced Approach for Dynamic Model Merging

Zijun Chen, Zhanpeng Zhou, Bo Zhang, Weinan Zhang, Xi Sun, Junchi Yan

arXiv:2506.18135v11 citationsh-index: 26IJCNN

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing multi-task adaptation in AI models for researchers and practitioners, though it is incremental as it builds on existing model merging techniques.

The paper tackled the problem of understanding and improving model merging for multi-task abilities by analyzing it from a representation perspective, revealing two key capabilities, and proposed SE-Merging, a self-enhanced framework that dynamically adjusts merging coefficients without additional training, achieving significant performance improvements.

Model merging has gained increasing attention due to its intriguing property: interpolating the parameters of different task-specific fine-tuned models leads to multi-task abilities. However, despite its empirical success, the underlying mechanisms of model merging remain poorly understood. In this work, we delve into the mechanism behind model merging from a representation perspective. Our analysis reveals that model merging achieves multi-task abilities through two key capabilities: i) distinguishing samples from different tasks, and ii) adapting to the corresponding expert model for each sample. These two capabilities allow the merged model to retain task-specific expertise, enabling efficient multi-task adaptation. Building on these insights, we propose \texttt{SE-Merging}, a self-enhanced model merging framework that leverages these two characteristics to dynamically identify the corresponding task for each sample and then adaptively rescales the merging coefficients to further enhance task-specific expertise in the merged model. Notably, \texttt{SE-Merging} achieves dynamic model merging without additional training. Extensive experiments demonstrate that \texttt{SE-Merging} achieves significant performance improvements while remaining compatible with existing model merging techniques.

View on arXiv PDF

Similar