LGSDDec 8, 2025

A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data

arXiv:2512.07741v11 citationsh-index: 13Sci Rep
Originality Incremental advance
AI Analysis

This addresses the need for transparent and explainable AI tools to support clinicians in psychiatric assessment by integrating multimodal data, though it is incremental in applying Bayesian networks to this domain.

The paper tackled the problem of predicting depression and anxiety symptoms from voice and speech data using a Bayesian network model, achieving ROC-AUC scores of 0.842 for depression and 0.831 for anxiety on a dataset of 30,135 speakers.

During psychiatric assessment, clinicians observe not only what patients report, but important nonverbal signs such as tone, speech rate, fluency, responsiveness, and body language. Weighing and integrating these different information sources is a challenging task and a good candidate for support by intelligence-driven tools - however this is yet to be realized in the clinic. Here, we argue that several important barriers to adoption can be addressed using Bayesian network modelling. To demonstrate this, we evaluate a model for depression and anxiety symptom prediction from voice and speech features in large-scale datasets (30,135 unique speakers). Alongside performance for conditions and symptoms (for depression, anxiety ROC-AUC=0.842,0.831 ECE=0.018,0.015; core individual symptom ROC-AUC>0.74), we assess demographic fairness and investigate integration across and redundancy between different input modality types. Clinical usefulness metrics and acceptability to mental health service users are explored. When provided with sufficiently rich and large-scale multimodal data streams and specified to represent common mental conditions at the symptom rather than disorder level, such models are a principled approach for building robust assessment support tools: providing clinically-relevant outputs in a transparent and explainable format that is directly amenable to expert clinical supervision.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes