RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions
This work addresses the problem of slow adoption of clinical question answering systems by providing a more realistic dataset for health professionals, though it is incremental as it builds on existing QA datasets like BioASQ.
The authors introduced RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM to address the lack of question-answering datasets reflecting real-world clinical needs, showing that the LLM is more cost-efficient for generating QA pairs and that RealMedQA provides a greater challenge to top QA models with lower lexical similarity between questions and answers compared to BioASQ.
Clinical question answering systems have the potential to provide clinicians with relevant and timely answers to their questions. Nonetheless, despite the advances that have been made, adoption of these systems in clinical settings has been slow. One issue is a lack of question-answering datasets which reflect the real-world needs of health professionals. In this work, we present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM. We describe the process for generating and verifying the QA pairs and assess several QA models on BioASQ and RealMedQA to assess the relative difficulty of matching answers to questions. We show that the LLM is more cost-efficient for generating "ideal" QA pairs. Additionally, we achieve a lower lexical similarity between questions and answers than BioASQ which provides an additional challenge to the top two QA models, as per the results. We release our code and our dataset publicly to encourage further research.