ASLGNov 13, 2020

Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables

arXiv:2011.06739v33 citations
Originality Incremental advance
AI Analysis

This work provides a more generalizable method for detecting depression using vocal biomarkers, which is important for developing robust diagnostic tools for clinicians.

The paper addresses the challenge of generalizing depression detection models across different databases by proposing a dilated CNN trained on Articulatory Coordination Features (ACFs) derived from Vocal Tract Variables (TVs). This model achieved a ~10% relative accuracy improvement compared to models trained on single databases.

Depression detection using vocal biomarkers is a highly researched area. Articulatory coordination features (ACFs) are developed based on the changes in neuromotor coordination due to psychomotor slowing, a key feature of Major Depressive Disorder. However findings of existing studies are mostly validated on a single database which limits the generalizability of results. Variability across different depression databases adversely affects the results in cross corpus evaluations (CCEs). We propose to develop a generalized classifier for depression detection using a dilated Convolutional Neural Network which is trained on ACFs extracted from two depression databases. We show that ACFs derived from Vocal Tract Variables (TVs) show promise as a robust set of features for depression detection. Our model achieves relative accuracy improvements of ~10% compared to CCEs performed on models trained on a single database. We extend the study to show that fusing TVs and Mel-Frequency Cepstral Coefficients can further improve the performance of this classifier.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes