CL AIApr 8

EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents

Xueren Ge, Sahil Murtaza, Anthony Cortez, Homa Alemzadeh

arXiv:2604.0754970.1h-index: 18

AI Analysis

This addresses the problem of limited annotated multi-party medical dialogues for researchers and practitioners in conversational AI and healthcare, though it is incremental as it builds on existing data generation methods.

The paper tackled the lack of multi-party medical dialogue datasets for conversational diagnosis prediction by generating EMSDialog, a synthetic dataset of 4,414 multi-speaker emergency medical service conversations from electronic patient care reports, which improved accuracy, timeliness, and stability in diagnosis prediction when used for training.

Conversational diagnosis prediction requires models to track evolving evidence in streaming clinical conversations and decide when to commit to a diagnosis. Existing medical dialogue corpora are largely dyadic or lack the multi-party workflow and annotations needed for this setting. We introduce an ePCR-grounded, topic-flow-based multi-agent generation pipeline that iteratively plans, generates, and self-refines dialogues with rule-based factual and topic flow checks. The pipeline yields EMSDialog, a dataset of 4,414 synthetic multi-speaker EMS conversations based on a real-world ePCR dataset, annotated with 43 diagnoses, speaker roles, and turn-level topics. Human and LLM evaluations confirm high quality and realism of EMSDialog using both utterance- and conversation-level metrics. Results show that EMSDialog-augmented training improves accuracy, timeliness, and stability of EMS conversational diagnosis prediction.

View on arXiv PDF

Similar