RAG-BioQA Retrieval-Augmented Generation for Long-Form Biomedical Question Answering
This addresses the need for accessible, evidence-based biomedical knowledge retrieval to support clinical decision-making, though it is incremental as it builds on existing retrieval-augmented generation methods.
The paper tackled the problem of generating comprehensive, evidence-based long-form answers for biomedical questions, which existing systems lack, and achieved significant improvements over baselines on the PubMedQA dataset with gains in BLEU, ROUGE, and METEOR metrics.
The exponential growth of biomedical literature creates significant challenges for accessing precise medical information. Current biomedical question-answering systems primarily focus on short-form answers, failing to provide the comprehensive explanations necessary for clinical decision-making. We present RAG-BioQA, a novel framework combining retrieval-augmented generation with domain-specific fine-tuning to produce evidence-based, long-form biomedical answers. Our approach integrates BioBERT embeddings with FAISS indexing and compares various re-ranking strategies (BM25, ColBERT, MonoT5) to optimize context selection before synthesizing evidence through a fine-tuned T5 model. Experimental results on the PubMedQA dataset show significant improvements over baselines, with our best model achieving substantial gains across BLEU, ROUGE, and METEOR metrics, advancing the state of accessible, evidence-based biomedical knowledge retrieval.