AIApr 13

Dynamic Summary Generation for Interpretable Multimodal Depression Detection

Shiyu Teng, Jiaqing Liu, Hao Sun, Yu Li, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-Wei Chen

arXiv:2604.1133488.2h-index: 13

Predicted impact top 16% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For clinicians and patients, this work provides an interpretable depression screening tool that reduces reliance on subjective symptom ratings, though the method is incremental.

The paper proposes a multi-stage LLM-based framework for depression detection that performs binary screening, five-class severity classification, and continuous regression, generating interpretable clinical summaries. On E-DAIC and CMDC datasets, it achieves significant improvements over state-of-the-art baselines in both accuracy and interpretability.

Depression remains widely underdiagnosed and undertreated because stigma and subjective symptom ratings hinder reliable screening. To address this challenge, we propose a coarse-to-fine, multi-stage framework that leverages large language models (LLMs) for accurate and interpretable detection. The pipeline performs binary screening, five-class severity classification, and continuous regression. At each stage, an LLM produces progressively richer clinical summaries that guide a multimodal fusion module integrating text, audio, and video features, yielding predictions with transparent rationale. The system then consolidates all summaries into a concise, human-readable assessment report. Experiments on the E-DAIC and CMDC datasets show significant improvements over state-of-the-art baselines in both accuracy and interpretability.

View on arXiv PDF

Similar