LG CVNov 14, 2024

Rethinking Weight-Averaged Model-merging

Hu Wang, Congbo Ma, Ibrahim Almakky, Ian Reid, Gustavo Carneiro, Mohammad Yaqub

arXiv:2411.09263v512.56 citationsh-index: 10Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the interpretability of model merging for stakeholders concerned with safety and reliability, but it is incremental as it builds on existing techniques without introducing new methods.

The paper tackles the problem of understanding why weight-averaged model merging works effectively without additional training, providing empirical insights into its mechanisms through analyses of weight structures, comparisons across architectures and datasets, and studies on regularization effects.

Model merging, particularly through weight averaging, has shown surprising effectiveness in saving computations and improving model performance without any additional training. However, the interpretability of why and how this technique works remains unclear. In this work, we reinterpret weight-averaged model merging through the lens of interpretability and provide empirical insights into the underlying mechanisms that govern its behavior. We approach the problem from three perspectives: (1) we analyze the learned weight structures and demonstrate that model weights encode structured representations that help explain the compatibility of weight averaging; (2) we compare averaging in weight space and feature space across diverse model architectures (CNNs and ViTs) and datasets, aiming to expose under which circumstances what combination paradigm will work more effectively; (3) we study the effect of parameter scaling on prediction stability, highlighting how weight averaging acts as a form of regularization that contributes to robustness. By framing these analyses in an interpretability context, our work contributes to a more transparent and systematic understanding of model merging for stakeholders interested in the safety and reliability of untrained model combination methods. The code is available at https://github.com/billhhh/Rethink-Merge.

View on arXiv PDF Code

Similar