CYAILGApr 2, 2019

The Verbal and Non Verbal Signals of Depression -- Combining Acoustics, Text and Visuals for Estimating Depression Level

arXiv:1904.07656v12 citations
Originality Incremental advance
AI Analysis

This work addresses depression assessment for mental health applications, presenting an incremental improvement in multimodal fusion.

The paper tackled estimating depression levels by fusing acoustic, text, and visual modalities using an attention-based deep neural network, achieving a 7.17% improvement in RMSE and 8.08% in MAE over state-of-the-art methods on the DAIC-WOZ dataset.

Depression is a serious medical condition that is suffered by a large number of people around the world. It significantly affects the way one feels, causing a persistent lowering of mood. In this paper, we propose a novel attention-based deep neural network which facilitates the fusion of various modalities. We use this network to regress the depression level. Acoustic, text and visual modalities have been used to train our proposed network. Various experiments have been carried out on the benchmark dataset, namely, Distress Analysis Interview Corpus - a Wizard of Oz (DAIC-WOZ). From the results, we empirically justify that the fusion of all three modalities helps in giving the most accurate estimation of depression level. Our proposed approach outperforms the state-of-the-art by 7.17% on root mean squared error (RMSE) and 8.08% on mean absolute error (MAE).

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes