CL AIJan 1

DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection

Yuxin Li, Xiangyu Zhang, Yifei Li, Zhiwei Guo, Haoyang Zhang, Eng Siong Chng, Cuntai Guan

arXiv:2601.00303v10.6h-index: 13

Originality Highly original

AI Analysis

This addresses robustness issues in mental health screening systems, particularly for camouflaged depression where individuals maintain positive language despite depressive states.

The paper tackles the problem of semantic bias in depression detection models, where models learn shortcuts from linguistic sentiment rather than true depression indicators, by proposing DepFlow - a three-stage depression-conditioned text-to-speech framework that creates acoustic-semantic mismatches to mitigate this bias. The resulting Camouflage Depression-oriented Augmentation dataset improves depression detection macro-F1 by 9%, 12%, and 5% across three architectures.

Speech is a scalable and non-invasive biomarker for early mental health screening. However, widely used depression datasets like DAIC-WOZ exhibit strong coupling between linguistic sentiment and diagnostic labels, encouraging models to learn semantic shortcuts. As a result, model robustness may be compromised in real-world scenarios, such as Camouflaged Depression, where individuals maintain socially positive or neutral language despite underlying depressive states. To mitigate this semantic bias, we propose DepFlow, a three-stage depression-conditioned text-to-speech framework. First, a Depression Acoustic Encoder learns speaker- and content-invariant depression embeddings through adversarial training, achieving effective disentanglement while preserving depression discriminability (ROC-AUC: 0.693). Second, a flow-matching TTS model with FiLM modulation injects these embeddings into synthesis, enabling control over depressive severity while preserving content and speaker identity. Third, a prototype-based severity mapping mechanism provides smooth and interpretable manipulation across the depression continuum. Using DepFlow, we construct a Camouflage Depression-oriented Augmentation (CDoA) dataset that pairs depressed acoustic patterns with positive/neutral content from a sentiment-stratified text bank, creating acoustic-semantic mismatches underrepresented in natural data. Evaluated across three depression detection architectures, CDoA improves macro-F1 by 9%, 12%, and 5%, respectively, consistently outperforming conventional augmentation strategies in depression Detection. Beyond enhancing robustness, DepFlow provides a controllable synthesis platform for conversational systems and simulation-based evaluation, where real clinical data remains limited by ethical and coverage constraints.

View on arXiv PDF

Similar