Model Merging by Output-Space Projection

Bethan Evans, Benjamin Etheridge, Stephen Roberts, Jared Tanner

arXiv:2605.2910163.3h-index: 2

Predicted impact top 31% in LG · last 90 daysOriginality Incremental advance

AI Analysis

Provides a principled, optimality-guaranteed framework for multi-task model merging, addressing heuristic limitations of prior methods for practitioners combining fine-tuned models.

Model merging is formulated as a convex quadratic program over residual updates, minimizing a squared-output calibration objective. The method matches or outperforms existing approaches in single-layer settings and shows consistent gains across language and vision benchmarks.

Model merging combines fine-tuned checkpoints into a single multi-task model without retraining. Existing methods - such as task arithmetic, model soups, TIES, and DARE - are computationally efficient and empirically successful, but rely on heuristic design choices and lack formal optimality guarantees. We show that merging can be formulated as a convex quadratic programme over residual updates, yielding weights that minimise a squared-output calibration objective using calibration inputs and fine-tuned model outputs, and subsuming existing methods as special cases. Our framework yields a closed-form diagnostic - the fraction of residual energy captured by a chosen basis - that predicts downstream merge quality using only the calibration set. Empirically, the QP matches or outperforms existing methods in the single-layer setting, and we characterise when the optimal basis provides significant gains over the cheaper diagonal QP. We extend to multi-layer merging via a sequential layer-wise algorithm and demonstrate consistent gains across language and vision benchmarks.

View on arXiv PDF

Similar