LG AI CLFeb 6, 2025

Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing

Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu

arXiv:2502.04411v219.713 citationsh-index: 22

Originality Highly original

AI Analysis

This work addresses the problem of efficiently merging multiple LLMs for enhanced performance in real-world reasoning tasks, offering a storage-efficient solution that mitigates parameter conflicts.

The paper tackles performance degradation in merging finetuned LLMs due to parameter conflicts by proposing a method that averages low-conflict layers and uses task-level expert routing for high-conflict layers, achieving significant performance improvements with reduced storage and compute costs in experiments on LLaMA and Qwen models.

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by selecting individual models during inference, it imposes excessive storage and compute costs, and fails to leverage the common knowledge from different models. In this work, we observe that different layers exhibit varying levels of parameter conflicts. Building on this insight, we average layers with minimal parameter conflicts and use a novel task-level expert routing for layers with significant conflicts. To further reduce storage costs, inspired by task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense expert and several sparse experts. Considering the out-of-distribution samples, we select and merge appropriate experts based on the task uncertainty of the input data. We conduct extensive experiments on both LLaMA and Qwen with varying parameter scales, and evaluate on real-world reasoning tasks. Results demonstrate that our method consistently achieves significant performance improvements while requiring less system cost compared to existing methods.

View on arXiv PDF

Similar