Justin T. Baker

14.6HCJan 7, 2025Code

LlaMADRS: Prompting Large Language Models for Interview-Based Depression Assessment

Gaoussou Youssouf Kebe, Jeffrey M. Girard, Einat Liebenthal et al.

This study introduces LlaMADRS, a novel framework leveraging open-source Large Language Models (LLMs) to automate depression severity assessment using the Montgomery-Asberg Depression Rating Scale (MADRS). We employ a zero-shot prompting strategy with carefully designed cues to guide the model in interpreting and scoring transcribed clinical interviews. Our approach, tested on 236 real-world interviews from the Context-Adaptive Multimodal Informatics (CAMI) dataset, demonstrates strong correlations with clinician assessments. The Qwen 2.5--72b model achieves near-human level agreement across most MADRS items, with Intraclass Correlation Coefficients (ICC) closely approaching those between human raters. We provide a comprehensive analysis of model performance across different MADRS items, highlighting strengths and current limitations. Our findings suggest that LLMs, with appropriate prompting, can serve as efficient tools for mental health assessment, potentially increasing accessibility in resource-limited settings. However, challenges remain, particularly in assessing symptoms that rely on non-verbal cues, underscoring the need for multimodal approaches in future work.

1.6SDMar 15, 2017

Deducing the severity of psychiatric symptoms from the human voice

Rita Singh, Justin Baker, Luciana Pennant et al.

Psychiatric illnesses are often associated with multiple symptoms, whose severity must be graded for accurate diagnosis and treatment. This grading is usually done by trained clinicians based on human observations and judgments made within doctor-patient sessions. Current research provides sufficient reason to expect that the human voice may carry biomarkers or signatures of many, if not all, these symptoms. Based on this conjecture, we explore the possibility of objectively and automatically grading the symptoms of psychiatric illnesses with reference to various standard psychiatric rating scales. Using acoustic data from several clinician-patient interviews within hospital settings, we use non-parametric models to learn and predict the relations between symptom-ratings and voice. In the process, we show that different articulatory-phonetic units of speech are able to capture the effects of different symptoms differently, and use this to establish a plausible methodology that could be employed for automatically grading psychiatric symptoms for clinical purposes.

Justin T. Baker

2 Papers