LGAIFeb 5, 2024

Vanishing Feature: Diagnosing Model Merging and Beyond

arXiv:2402.05966v44 citationsh-index: 1Has CodeCPAL
Originality Highly original
AI Analysis

This addresses a key bottleneck for practitioners in efficiently combining pre-trained neural networks, offering a novel solution with broad implications for model merging and pruning.

The paper tackles the problem of inconsistent performance in model merging by identifying the 'vanishing feature' phenomenon, where input-induced features diminish during propagation, and proposes a 'Preserve-First Merging' strategy that enables merged models to outperform original ones without post-training, achieving improvements in advanced settings.

Model merging offers an efficient way to combine pre-trained neural networks but often suffers from inconsistent performance, especially when merging models with different initializations. We identify the ``vanishing feature'' phenomenon, where input-induced features diminish during propagation through the merged model, degrading performance. Through theoretical and empirical analysis, we reveal that this phenomenon underpins challenges like variance collapse and explains techniques like permutation-based merging, post-merging normalization, etc. We show that existing normalization strategies can be enhanced by precisely targeting the vanishing feature issue. Leveraging these insights, we propose the ``Preserve-First Merging'' (PFM) strategy, which focuses on preserving early-layer features, enabling the merged models, for the first time, to outperform the original models in advanced settings without post-training. Furthermore, we demonstrate that the vanishing feature phenomenon extends to other contexts, such as model pruning. Applying post-pruning normalization to mitigate the issue significantly improves one-shot pruning performance at high sparsity, offering a simple and effective post-pruning solution. The code is available at https://github.com/XingyuQu/VF.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes