Adapting Large Language Models for Multi-Domain Retrieval-Augmented-Generation
This work addresses the challenge of making RAG systems more robust across multiple domains, which is incremental as it builds on existing RAG methods with specific tuning improvements.
The paper tackled the problem of poor out-of-domain generalization in multi-domain Retrieval-Augmented Generation (RAG) for large language models by introducing a diverse benchmark and testing tuning strategies, finding that sequence-level distillation improves performance over standard fine-tuning.
Retrieval-Augmented Generation (RAG) enhances LLM factuality, but multi-domain applications face challenges like lack of diverse benchmarks and poor out-of-domain generalization. The first contribution of this work is to introduce a diverse benchmark comprising a variety of question-answering tasks from 8 sources and covering 13 domains. Our second contribution consists in systematically testing out-of-domain generalization for typical RAG tuning strategies. While our findings reveal that standard fine-tuning fails to generalize effectively, we show that sequence-level distillation with teacher-generated labels improves out-of-domain performance by providing more coherent supervision. Our findings highlight key strategies for improving multi-domain RAG robustness.