IRFeb 23

A Systematic Study of Biomedical Retrieval Pipeline Trade-offs in Performance and Efficiency

arXiv:2604.20853h-index: 6
AI Analysis

It offers practical, evidence-based recommendations for researchers building biomedical retrieval systems, addressing a gap in systematic guidance.

This study provides empirical guidance on biomedical retrieval pipeline design by evaluating trade-offs in performance and efficiency across multiple datasets and configurations, finding that corpus aggregation yields the best retrieval quality and MedRAG/pubmed with HNSW indexing offers Pareto-optimal efficiency.

Retrieval systems are increasingly used in biomedical and clinical natural language processing applications, yet practical guidance for researchers building such systems is limited. In this work, we provide such guidance through an empirical study of how retrieval pipeline design choices affect performance and efficiency at scale. In particular, we examine retrieval over a variety of existing, public biomedical text datasets, leveraging a variety of disparate types of queries, including exam-style questions, conversational medical queries, community-asked questions, and non-question formulations across various retrieval pipeline settings spanning corpus selection, chunk granularity, and vector index configuration. Retrieval results are judged using a robust, win-rate comparison assessment via an LLM-as-a-judge setting with human validation. Across these experiments, we identify several points of concrete guidance for reviewers, including the superiority of corpus aggregation for absolute retrieval quality, and the emergence of MedRAG/pubmed as the Pareto-optimal singleton corpus under graph-based (HNSW) indexing, appropriate chunking strategies, and FAISS indexing choices that offer the best trade-offs in speed and efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes