CLSDASMay 20, 2023

Self-supervised representations in speech-based depression detection

arXiv:2305.12263v255 citations
AI Analysis

This work addresses data scarcity for depression detection from speech, which is an incremental improvement in a domain-specific application.

The paper tackled training data sparsity in speech-based depression detection by using self-supervised learning foundation models and knowledge transfer from ASR and emotion recognition, achieving state-of-the-art results on the DAIC-WOZ dataset.

This paper proposes handling training data sparsity in speech-based automatic depression detection (SDD) using foundation models pre-trained with self-supervised learning (SSL). An analysis of SSL representations derived from different layers of pre-trained foundation models is first presented for SDD, which provides insight to suitable indicator for depression detection. Knowledge transfer is then performed from automatic speech recognition (ASR) and emotion recognition to SDD by fine-tuning the foundation models. Results show that the uses of oracle and ASR transcriptions yield similar SDD performance when the hidden representations of the ASR model is incorporated along with the ASR textual information. By integrating representations from multiple foundation models, state-of-the-art SDD results based on real ASR were achieved on the DAIC-WOZ dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes