Synthetic Patient-Physician Dialogue Generation from Clinical Notes Using LLM
This addresses data scarcity and privacy issues in healthcare AI for developing medical dialogue systems, though it is an incremental improvement over existing synthetic generation methods.
The paper tackles the problem of acquiring training data for medical dialogue systems by proposing SynDial, a method that generates synthetic patient-physician dialogues from clinical notes using an LLM with zero-shot prompting and a feedback loop. The result is high-quality dialogues that achieve superior extractiveness and factuality compared to baselines, with diversity comparable to GPT-4.
Medical dialogue systems (MDS) enhance patient-physician communication, improve healthcare accessibility, and reduce costs. However, acquiring suitable data to train these systems poses significant challenges. Privacy concerns prevent the use of real conversations, necessitating synthetic alternatives. Synthetic dialogue generation from publicly available clinical notes offers a promising solution to this issue, providing realistic data while safeguarding privacy. Our approach, SynDial, uses a single LLM iteratively with zero-shot prompting and a feedback loop to generate and refine high-quality synthetic dialogues. The feedback consists of weighted evaluation scores for similarity and extractiveness. The iterative process ensures dialogues meet predefined thresholds, achieving superior extractiveness as a result of the feedback loop. Additionally, evaluation shows that the generated dialogues excel in factuality metric compared to the baselines and has comparable diversity scores with GPT4.