CL LGNov 26, 2025

A Systematic Study of Model Merging Techniques in Large Language Models

Oğuz Kağan Hitit, Leander Girrbach, Zeynep Akata

arXiv:2511.21437v19.63 citationsh-index: 8

Originality Incremental advance

AI Analysis

This work addresses the problem of efficiently reusing fine-tuned LLMs for researchers and practitioners, but it is incremental as it highlights limitations rather than introducing new solutions.

The study systematically evaluated six model merging techniques on large language models (LLMs) and found that only the simple Task Arithmetic method reliably improved performance, while others often caused significant drops, indicating current methods do not transfer well to LLMs.

Model merging combines multiple fine-tuned checkpoints into a single model without additional training, offering an attractive approach to reusing models and efficiently improving performance. However, it remains unclear whether the advantages reported for smaller models and classifiers generalize to LLMs. We present a large-scale, systematic evaluation of six state-of-the-art merging methods, including recent subspace methods, across four open-weight LLMs, twelve fine-tuned checkpoints per base model, and sixteen standard LLM benchmarks. Evaluating through standardized benchmarks, we measure both the probability that a merged model outperforms the base model and relative gains over the best individual checkpoint. Our results show that the oldest and simplest method, Task Arithmetic, is the only approach that reliably yields performance gains on LLMs. Other interference-aware and subspace merging methods typically result in significant performance drops. Our findings indicate that current merging techniques do not directly transfer to modern LLMs. This motivates the design of LLM-specific merging algorithms and merging-aware fine-tuning methods. Code will be released upon acceptance of this paper.

View on arXiv PDF

Similar