CL AIMay 19

Synthesis and Evaluation of Long-term History-aware Medical Dialogue

Hebin Hu, Renke Dai, Ah-Hwee Tan, Yilin Kang

arXiv:2605.1976675.3

Predicted impact top 89% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers developing healthcare agents, this provides a much-needed benchmark to evaluate long-term memory and reasoning, filling a gap in existing datasets.

The paper introduces MediLongChat, a synthetic dataset of long-term medical dialogues, and benchmarks showing that even state-of-the-art LLMs struggle with cross-session reasoning tasks.

An effective healthcare agent must be able to recall and reason over a patient's longitudinal medical history. However, the absence of datasets with realistic long-term dialogue timelines limits systematic evaluation. Real clinical text is constrained by privacy and ethics, while existing benchmarks focus on isolated interactions, failing to capture cross-session reasoning. We introduce a framework for synthesizing high-quality, long-term medical dialogues with LLMs. Our approach entails a knowledge-guided decomposition into three stages: constructing synthetic patient profiles with diverse disease and complication trajectories, generating multi-turn dialogues per encounter, and integrating them into a coherent longitudinal history dataset, MediLongChat. We establish three benchmark tasks-In-dialogue Reasoning, Cross-dialogue Reasoning, and Synthesis Reasoning-to evaluate the memory capabilities of healthcare agents. To assess data quality, we introduce a multi-dimensional evaluation framework combining vector-based metrics with LLM-as-a-judge assessments. Specifically, we define automatic measures-Faithfulness, Coherence, and Diversity-together with two LLM-based evaluations: Correctness and Realism. Benchmark experiments show that even state-of-the-art LLMs struggle with MediLongChat. These findings highlight the benchmark's applicability and underscore the need for tailored methods to advance healthcare agents.

View on arXiv PDF

Similar