CLAIIRLGMay 23, 2025

MetaGen Blended RAG: Unlocking Zero-Shot Precision for Specialized Domain Question-Answering

arXiv:2505.18247v31 citationsh-index: 2
Originality Highly original
AI Analysis

This addresses the challenge of achieving zero-shot precision for specialized domain question-answering in enterprise settings, offering a scalable solution without fine-tuning.

The paper tackles the problem of Retrieval-Augmented Generation (RAG) struggling with domain-specific enterprise datasets due to semantic variability and lack of fine-tuning, achieving 82% retrieval accuracy and 77% RAG accuracy on the biomedical PubMedQA dataset, surpassing prior zero-shot benchmarks.

Retrieval-Augmented Generation (RAG) struggles with domain-specific enterprise datasets, often isolated behind firewalls and rich in complex, specialized terminology unseen by LLMs during pre-training. Semantic variability across domains like medicine, networking, or law hampers RAG's context precision, while fine-tuning solutions are costly, slow, and lack generalization as new data emerges. Achieving zero-shot precision with retrievers without fine-tuning still remains a key challenge. We introduce 'MetaGen Blended RAG', a novel enterprise search approach that enhances semantic retrievers through a metadata generation pipeline and hybrid query indexes using dense and sparse vectors. By leveraging key concepts, topics, and acronyms, our method creates metadata-enriched semantic indexes and boosted hybrid queries, delivering robust, scalable performance without fine-tuning. On the biomedical PubMedQA dataset, MetaGen Blended RAG achieves 82% retrieval accuracy and 77% RAG accuracy, surpassing all prior zero-shot RAG benchmarks and even rivaling fine-tuned models on that dataset, while also excelling on datasets like SQuAD and NQ. This approach redefines enterprise search using a new approach to building semantic retrievers with unmatched generalization across specialized domains.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes