HC SD ASOct 29, 2019

DEPA: Self-Supervised Audio Embedding for Depression Detection

Pingyue Zhang, Mengyue Wu, Heinrich Dinkel, Kai Yu

arXiv:1910.13028v321.977 citations

Originality Incremental advance

AI Analysis

This work addresses depression detection for mental health applications, but it is incremental as it applies self-supervised learning specifically to audio processing for this task.

The paper tackles the problem of limited data availability and representation learning for depression detection by proposing DEPA, a self-supervised pretrained audio embedding method, which achieves significant performance gains on downstream tasks, such as on the DAIC and MDD datasets.

Depression detection research has increased over the last few decades, one major bottleneck of which is the limited data availability and representation learning. Recently, self-supervised learning has seen success in pretraining text embeddings and has been applied broadly on related tasks with sparse data, while pretrained audio embeddings based on self-supervised learning are rarely investigated. This paper proposes DEPA, a self-supervised, pretrained depression audio embedding method for depression detection. An encoder-decoder network is used to extract DEPA on in-domain depressed datasets (DAIC and MDD) and out-domain (Switchboard, Alzheimer's) datasets. With DEPA as the audio embedding extracted at response-level, a significant performance gain is achieved on downstream tasks, evaluated on both sparse datasets like DAIC and large major depression disorder dataset (MDD). This paper not only exhibits itself as a novel embedding extracting method capturing response-level representation for depression detection but more significantly, is an exploration of self-supervised learning in a specific task within audio processing.

View on arXiv PDF

Similar