CLAug 16, 2023

MDDial: A Multi-turn Differential Diagnosis Dialogue Dataset with Reliability Evaluation

Srija Macherla, Man Luo, Mihir Parmar, Chitta Baral

arXiv:2308.08147v14.310 citationsh-index: 30Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses a gap in medical AI by providing a dataset and evaluation metric for building ADD dialogue systems, which could improve healthcare accessibility, though it is incremental as it builds on existing non-English datasets.

The authors tackled the lack of publicly available English dialogue datasets for Automatic Differential Diagnosis (ADD) by introducing MDDial, a multi-turn differential diagnosis dialogue dataset, and proposed a unified score to evaluate ADD systems by considering the interplay between symptoms and diagnosis, with experiments showing that moderate-size language models perform poorly on MDDial despite success in general tasks.

Dialogue systems for Automatic Differential Diagnosis (ADD) have a wide range of real-life applications. These dialogue systems are promising for providing easy access and reducing medical costs. Building end-to-end ADD dialogue systems requires dialogue training datasets. However, to the best of our knowledge, there is no publicly available ADD dialogue dataset in English (although non-English datasets exist). Driven by this, we introduce MDDial, the first differential diagnosis dialogue dataset in English which can aid to build and evaluate end-to-end ADD dialogue systems. Additionally, earlier studies present the accuracy of diagnosis and symptoms either individually or as a combined weighted score. This method overlooks the connection between the symptoms and the diagnosis. We introduce a unified score for the ADD system that takes into account the interplay between symptoms and diagnosis. This score also indicates the system's reliability. To the end, we train two moderate-size of language models on MDDial. Our experiments suggest that while these language models can perform well on many natural language understanding tasks, including dialogue tasks in the general domain, they struggle to relate relevant symptoms and disease and thus have poor performance on MDDial. MDDial will be released publicly to aid the study of ADD dialogue research.

View on arXiv PDF Code

Similar