ASCLFeb 13, 2022

Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings

arXiv:2202.06238v120 citations
Originality Incremental advance
AI Analysis

This work addresses depression classification for mental health applications, but it is incremental as it builds on existing multimodal and attention-based methods.

The paper tackled multimodal depression classification by combining articulatory coordination features from audio and hierarchical attention-based text embeddings, achieving improvements of 7.5% and 13.7% in area under the ROC curve compared to unimodal classifiers.

Multimodal depression classification has gained immense popularity over the recent years. We develop a multimodal depression classification system using articulatory coordination features extracted from vocal tract variables and text transcriptions obtained from an automatic speech recognition tool that yields improvements of area under the receiver operating characteristics curve compared to uni-modal classifiers (7.5% and 13.7% for audio and text respectively). We show that in the case of limited training data, a segment-level classifier can first be trained to then obtain a session-wise prediction without hindering the performance, using a multi-stage convolutional recurrent neural network. A text model is trained using a Hierarchical Attention Network (HAN). The multimodal system is developed by combining embeddings from the session-level audio model and the HAN text model

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes