DischargeSim: A Simulation Benchmark for Educational Doctor-Patient Communication at Discharge
This addresses the need for benchmarks in clinical education for LLMs, focusing on equitable, personalized patient support, but it is incremental as it builds on existing LLM evaluation frameworks.
The paper tackles the problem of evaluating large language models (LLMs) for post-visit patient education at discharge, introducing DischargeSim as a benchmark that simulates multi-turn conversations and assesses models on dialogue quality, personalized document generation, and patient comprehension, with experiments across 18 LLMs revealing significant gaps in capability and that model size does not always improve outcomes.
Discharge communication is a critical yet underexplored component of patient care, where the goal shifts from diagnosis to education. While recent large language model (LLM) benchmarks emphasize in-visit diagnostic reasoning, they fail to evaluate models' ability to support patients after the visit. We introduce DischargeSim, a novel benchmark that evaluates LLMs on their ability to act as personalized discharge educators. DischargeSim simulates post-visit, multi-turn conversations between LLM-driven DoctorAgents and PatientAgents with diverse psychosocial profiles (e.g., health literacy, education, emotion). Interactions are structured across six clinically grounded discharge topics and assessed along three axes: (1) dialogue quality via automatic and LLM-as-judge evaluation, (2) personalized document generation including free-text summaries and structured AHRQ checklists, and (3) patient comprehension through a downstream multiple-choice exam. Experiments across 18 LLMs reveal significant gaps in discharge education capability, with performance varying widely across patient profiles. Notably, model size does not always yield better education outcomes, highlighting trade-offs in strategy use and content prioritization. DischargeSim offers a first step toward benchmarking LLMs in post-visit clinical education and promoting equitable, personalized patient support.