CLJan 16, 2025

Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators

arXiv:2501.09484v211 citationsh-index: 9Has Code
AI Analysis

This work addresses the need for more accurate patient simulation in medical AI, offering a tool for better evaluation and synthetic data generation, though it is incremental in improving existing simulation methods.

This paper tackles the problem of simulating patients for evaluating diagnostic models by developing a patient simulator that uses dialogue strategies from real conversations, resulting in higher anthropomorphism and lower hallucination rates. It explores the relationship between inquiry and diagnosis, showing that poor inquiry limits diagnosis effectiveness, with experiments revealing substantial differences in inquiry performance among models.

Recently, large language models have shown great potential to transform online medical consultation. Despite this, most research targets improving diagnostic accuracy with ample information, often overlooking the inquiry phase. Some studies try to evaluate or refine doctor models by using prompt-engineered patient agents. However, prompt engineering alone falls short in accurately simulating real patients. We need to explore new paradigms for patient simulation. Furthermore, the relationship between inquiry and diagnosis remains unexplored. This paper extracts dialogue strategies from real doctor-patient conversations to guide the training of a patient simulator. Our simulator shows higher anthropomorphism and lower hallucination rates, using dynamic dialogue strategies. This innovation offers a more accurate evaluation of diagnostic models and generates realistic synthetic data. We conduct extensive experiments on the relationship between inquiry and diagnosis, showing they adhere to Liebig's law: poor inquiry limits diagnosis effectiveness, regardless of diagnostic skill, and vice versa. The experiments also reveal substantial differences in inquiry performance among models. To delve into this phenomenon, the inquiry process is categorized into four distinct types. Analyzing the distribution of inquiries across these types helps explain the performance differences. The weights of our patient simulator are available https://github.com/PatientSimulator/PatientSimulator.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes