CLOct 16, 2025

Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation

Xujun Peng, Anoop Kumar, Jingyu Wu, Parker Glenn, Daben Liu

arXiv:2510.14915v11 citationsh-index: 2EMNLP

Originality Incremental advance

AI Analysis

This addresses reliability issues in industrial RAG systems, though it appears incremental as it builds on existing fine-tuning and merging techniques.

The paper tackled the problem of inconsistent outputs in Retrieval-Augmented Generation systems by proposing a layer-wise merging strategy, achieving a ~47.5% improvement in response similarity over the baseline.

Retrieval-Augmented Generation (RAG) systems leverage Large Language Models (LLMs) to generate accurate and reliable responses that are grounded in retrieved context. However, LLMs often generate inconsistent outputs for semantically equivalent inputs, a problem compounded by the scarcity of consistency-focused training data and the limitations of current fine-tuning techniques in enhancing output consistency. We propose a new approach combining systematic synthetic data generation, triplet loss for better embeddings, and a novel layer-wise model merging approach. Using consistency-aware weights derived from intermediate layer activations, our method effectively integrates knowledge from specialized models. Experimental results how that our merged model significantly enhances output consistency, achieving a ~47.5\% improvement in response similarity over the baseline, thus offering a practical solution for increasing the reliability of an industrial RAG system.

View on arXiv PDF

Similar