IR AIMay 28, 2025

Rethinking Hybrid Retrieval: When Small Embeddings and LLM Re-ranking Beat Bigger Models

arXiv:2506.00049v14 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the challenge of optimizing embedding model selection for RAG systems, offering a more efficient approach that reduces computational requirements while improving accuracy, though it is incremental as it builds on existing hybrid retrieval methods.

This paper tackles the problem of selecting embedding models for Retrieval-Augmented Generation (RAG) systems by comparing MiniLM-v6 and BGE-Large in a tri-modal hybrid retrieval framework, finding that the smaller MiniLM-v6 outperforms the larger BGE-Large when integrated with LLM-based re-ranking, with significant improvements in retrieval quality on datasets like SciFact, FIQA, and NFCorpus.

This paper presents a comparison of embedding models in tri-modal hybrid retrieval for Retrieval-Augmented Generation (RAG) systems. We investigate the fusion of dense semantic, sparse lexical, and graph-based embeddings, focusing on the performance of the MiniLM-v6 and BGE-Large architectures. Contrary to conventional assumptions, our results show that the compact MiniLM-v6 outperforms the larger BGE-Large when integrated with LLM-based re-ranking within our tri-modal hybrid framework. Experiments conducted on the SciFact, FIQA, and NFCorpus datasets demonstrate significant improvements in retrieval quality with the MiniLM-v6 configuration. The performance difference is particularly pronounced in agentic re-ranking scenarios, indicating better alignment between MiniLM-v6's embedding space and LLM reasoning. Our findings suggest that embedding model selection for RAG systems should prioritize compatibility with multi-signal fusion and LLM alignment, rather than relying solely on larger models. This approach may reduce computational requirements while improving retrieval accuracy and efficiency.

View on arXiv PDF

Similar