SD LG ASOct 27, 2022

A knowledge-driven vowel-based approach of depression classification from speech using data augmentation

arXiv:2210.15261v18.310 citationsh-index: 19Has Code

Originality Incremental advance

AI Analysis

This work addresses depression diagnosis for clinicians by providing an explainable AI tool, though it is incremental as it builds on existing speech-based classification approaches.

The paper tackled depression classification from speech by modeling temporal dependencies and spectrotemporal information at the vowel level, achieving comparable performance to state-of-the-art methods with explainable properties.

We propose a novel explainable machine learning (ML) model that identifies depression from speech, by modeling the temporal dependencies across utterances and utilizing the spectrotemporal information at the vowel level. Our method first models the variable-length utterances at the local-level into a fixed-size vowel-based embedding using a convolutional neural network with a spatial pyramid pooling layer ("vowel CNN"). Following that, the depression is classified at the global-level from a group of vowel CNN embeddings that serve as the input of another 1D CNN ("depression CNN"). Different data augmentation methods are designed for both the training of vowel CNN and depression CNN. We investigate the performance of the proposed system at various temporal granularities when modeling short, medium, and long analysis windows, corresponding to 10, 21, and 42 utterances, respectively. The proposed method reaches comparable performance with previous state-of-the-art approaches and depicts explainable properties with respect to the depression outcome. The findings from this work may benefit clinicians by providing additional intuitions during joint human-ML decision-making tasks.

View on arXiv PDF Code

Similar