MeDiaQA: A Question Answering Dataset on Medical Dialogues
This provides a new benchmark for testing AI models on reasoning over medical dialogues, though it is incremental as it builds on existing QA dataset approaches.
The authors introduced MeDiaQA, a novel question-answering dataset with 22k multiple-choice questions based on 11k real medical dialogues covering 150 disease specialties, and developed MeDia-BERT which achieved 64.3% accuracy compared to human performance of 93%.
In this paper, we introduce MeDiaQA, a novel question answering(QA) dataset, which constructed on real online Medical Dialogues. It contains 22k multiple-choice questions annotated by human for over 11k dialogues with 120k utterances between patients and doctors, covering 150 specialties of diseases, which are collected from haodf.com and dxy.com. MeDiaQA is the first QA dataset where reasoning over medical dialogues, especially their quantitative contents. The dataset has the potential to test the computing, reasoning and understanding ability of models across multi-turn dialogues, which is challenging compared with the existing datasets. To address the challenges, we design MeDia-BERT, and it achieves 64.3% accuracy, while human performance of 93% accuracy, which indicates that there still remains a large room for improvement.