IRMar 23

Overview of TREC 2025 Biomedical Generative Retrieval (BioGen) Track

Deepak Gupta, Dina Demner-Fushman, William Hersh, Steven Bedrick, Kirk Roberts

arXiv:2603.2158266.4h-index: 4

Predicted impact top 36% in IR · last 90 daysOriginality Synthesis-oriented

AI Analysis

This work tackles the critical problem of inaccuracies in LLMs for high-stakes biomedical applications, but it is incremental as it focuses on evaluation rather than novel solutions.

The paper introduces the TREC 2025 BioGen Track to address hallucinations in large language models (LLMs) used for biomedical tasks, aiming to improve their grounding in verifiable sources through a new evaluation framework.

Recent advances in large language models (LLMs) have made significant progress across multiple biomedical tasks, including biomedical question answering, lay-language summarization of the biomedical literature, and clinical note summarization. These models have demonstrated strong capabilities in processing and synthesizing complex biomedical information and in generating fluent, human-like responses. Despite these advancements, hallucinations or confabulations remain key challenges when using LLMs in biomedical and other high-stakes domains. Inaccuracies may be particularly harmful in high-risk situations, such as medical question answering, making clinical decisions, or appraising biomedical research. Studies on the evaluation of the LLMs' abilities to ground generated statements in verifiable sources have shown that models perform significantly

View on arXiv PDF

Similar