LGAIDec 20, 2024

Non-Uniform Parameter-Wise Model Merging

arXiv:2412.15467v11 citationsh-index: 12BigData
Originality Incremental advance
AI Analysis

This addresses a bottleneck in model merging for distributed or multi-model scenarios, offering a scalable solution, though it appears incremental as it builds on parameter averaging methods.

The paper tackles the problem of merging machine learning models with different initializations or training trajectories, which can degrade performance, by introducing Non-uniform Parameter-wise Model Merging (NP Merge) that learns parameter contributions via gradient optimization, resulting in outperformance over past methods in various settings.

Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Traditional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in multiple settings, outperforming past methods. We also extend NP Merge to handle the merging of multiple models, showcasing its scalability and robustness.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes