CLJan 16, 2025

Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

Zhaocheng Liu, Quan Tu, Wen Ye, Yu Xiao, Zhishou Zhang, Hengfu Cui, Yalun Zhu, Qiang Ju, Shizheng Li, Jian Xie

arXiv:2501.09484v210.911 citationsh-index: 9Has Code

Originality Incremental advance

AI Analysis

This work addresses the need for more accurate patient simulation in medical AI, offering a tool for better evaluation and synthetic data generation, though it is incremental in improving existing simulation methods.

This paper tackles the problem of simulating patients for evaluating diagnostic models by developing a patient simulator that uses dialogue strategies from real conversations, resulting in higher anthropomorphism and lower hallucination rates. It explores the relationship between inquiry and diagnosis, showing that poor inquiry limits diagnosis effectiveness, with experiments revealing substantial differences in inquiry performance among models.

Recently, large language models have shown great potential to transform online medical consultation. Despite this, most research targets improving diagnostic accuracy with ample information, often overlooking the inquiry phase. Some studies try to evaluate or refine doctor models by using prompt-engineered patient agents. However, prompt engineering alone falls short in accurately simulating real patients. We need to explore new paradigms for patient simulation. Furthermore, the relationship between inquiry and diagnosis remains unexplored. This paper extracts dialogue strategies from real doctor-patient conversations to guide the training of a patient simulator. Our simulator shows higher anthropomorphism and lower hallucination rates, using dynamic dialogue strategies. This innovation offers a more accurate evaluation of diagnostic models and generates realistic synthetic data. We conduct extensive experiments on the relationship between inquiry and diagnosis, showing they adhere to Liebig's law: poor inquiry limits diagnosis effectiveness, regardless of diagnostic skill, and vice versa. The experiments also reveal substantial differences in inquiry performance among models. To delve into this phenomenon, the inquiry process is categorized into four distinct types. Analyzing the distribution of inquiries across these types helps explain the performance differences. The weights of our patient simulator are available https://github.com/PatientSimulator/PatientSimulator.

View on arXiv PDF Code

Similar