Demystifying Mergeability: Interpretable Properties to Predict Model Merging Success
This work addresses the challenge of predicting model merging success for machine learning practitioners, providing interpretable diagnostics that could guide fine-tuning strategies, though it is incremental in building on prior work on mergeability.
The paper tackled the problem of understanding the factors that determine the success of merging separately fine-tuned models, finding that mergeability depends on both the merging method and partner tasks, with subspace overlap and gradient alignment identified as foundational prerequisites for compatibility.
Model merging combines knowledge from separately fine-tuned models, yet success factors remain poorly understood. While recent work treats mergeability as an intrinsic property, we show with an architecture-agnostic framework that it fundamentally depends on both the merging method and the partner tasks. Using linear optimization over a set of interpretable pairwise metrics (e.g., gradient L2 distance), we uncover properties correlating with post-merge performance across four merging methods. We find substantial variation in success drivers (46.7% metric overlap; 55.3% sign agreement), revealing method-specific "fingerprints". Crucially, however, subspace overlap and gradient alignment metrics consistently emerge as foundational, method-agnostic prerequisites for compatibility. These findings provide a diagnostic foundation for understanding mergeability and motivate future fine-tuning strategies that explicitly encourage these properties.