AIAug 27, 2025

MODE: Mixture of Document Experts for RAG

arXiv:2509.00100v1

Originality Incremental advance

AI Analysis

This provides a practical solution for small and medium corpora where simplicity, speed, and topical focus are important, but it is incremental as it builds on existing RAG methods.

The paper tackles the problem of excessive infrastructure in Retrieval-Augmented Generation (RAG) for small, domain-specific collections by proposing MODE, a lightweight cluster-and-route retrieval method that matches or exceeds dense-retrieval baseline answer quality on HotpotQA and SQuAD corpora while reducing end-to-end retrieval time.

Retrieval-Augmented Generation (RAG) often relies on large vector databases and cross-encoders tuned for large-scale corpora, which can be excessive for small, domain-specific collections. We present MODE (Mixture of Document Experts), a lightweight alternative that replaces fine-grained nearest-neighbor search with cluster-and-route retrieval. Documents are embedded, grouped into semantically coherent clusters, and represented by cached centroids. At query time, we route to the top centroid(s) and retrieve context only within those clusters, eliminating external vector-database infrastructure and reranking while keeping latency low. On HotpotQA and SQuAD corpora with 100-500 chunks, MODE matches or exceeds a dense-retrieval baseline in answer quality while reducing end-to-end retrieval time. Ablations show that cluster granularity and multi-cluster routing control the recall/precision trade-off, and that tighter clusters improve downstream accuracy. MODE offers a practical recipe for small and medium corpora where simplicity, speed, and topical focus matter.

View on arXiv PDF

Similar