Efficient Federated Search for Retrieval-Augmented Generation
This addresses the efficiency and scalability issues in RAG systems for users dealing with distributed knowledge repositories, representing an incremental improvement over single-database approaches.
The paper tackles the problem of inefficient retrieval in retrieval-augmented generation (RAG) when information is distributed across multiple sources by introducing RAGRoute, a federated search mechanism that dynamically selects relevant data sources, reducing queries by up to 77.5% and communication volume by up to 76.2%.
Large language models (LLMs) have demonstrated remarkable capabilities across various domains but remain susceptible to hallucinations and inconsistencies, limiting their reliability. Retrieval-augmented generation (RAG) mitigates these issues by grounding model responses in external knowledge sources. Existing RAG workflows often leverage a single vector database, which is impractical in the common setting where information is distributed across multiple repositories. We introduce RAGRoute, a novel mechanism for federated RAG search. RAGRoute dynamically selects relevant data sources at query time using a lightweight neural network classifier. By not querying every data source, this approach significantly reduces query overhead, improves retrieval efficiency, and minimizes the retrieval of irrelevant information. We evaluate RAGRoute using the MIRAGE and MMLU benchmarks and demonstrate its effectiveness in retrieving relevant documents while reducing the number of queries. RAGRoute reduces the total number of queries up to 77.5% and communication volume up to 76.2%.