CVAILGMMJul 27, 2024

Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification on the DAIC-WOZ

arXiv:2407.19340v56 citationsh-index: 2
Originality Highly original
AI Analysis

This work addresses the challenge of improving diagnostic accuracy for Major Depressive Disorder, which affects millions globally, by introducing a new multi-modal approach that surpasses existing methods.

This paper tackles the problem of automated depression classification from clinical interview recordings by proposing a novel tri-modal architecture that integrates large language models, achieving an accuracy of 91.01% and an F1-Score of 85.95% in Leave-One-Subject-Out testing.

Major Depressive Disorder (MDD) is a pervasive mental health condition that affects 300 million people worldwide. This work presents a novel, BiLSTM-based tri-modal model-level fusion architecture for the binary classification of depression from clinical interview recordings. The proposed architecture incorporates Mel Frequency Cepstral Coefficients, Facial Action Units, and uses a two-shot learning based GPT-4 model to process text data. This is the first work to incorporate large language models into a multi-modal architecture for this task. It achieves impressive results on the DAIC-WOZ AVEC 2016 Challenge cross-validation split and Leave-One-Subject-Out cross-validation split, surpassing all baseline models and multiple state-of-the-art models. In Leave-One-Subject-Out testing, it achieves an accuracy of 91.01%, an F1-Score of 85.95%, a precision of 80%, and a recall of 92.86%.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes