CLCVOct 27, 2021

Detecting Dementia from Speech and Transcripts using Transformers

arXiv:2110.14769v354 citations
Originality Incremental advance
AI Analysis

This work addresses early diagnosis of Alzheimer's disease, a critical healthcare problem, but it is incremental as it builds on existing multimodal and transformer approaches.

The paper tackled detecting dementia from speech and transcripts by proposing multimodal transformer models that combine speech as images and text, achieving state-of-the-art results on the ADReSS Challenge dataset.

Alzheimer's disease (AD) constitutes a neurodegenerative disease with serious consequences to peoples' everyday lives, if it is not diagnosed early since there is no available cure. Alzheimer's is the most common cause of dementia, which constitutes a general term for loss of memory. Due to the fact that dementia affects speech, existing research initiatives focus on detecting dementia from spontaneous speech. However, little work has been done regarding the conversion of speech data to Log-Mel spectrograms and Mel-frequency cepstral coefficients (MFCCs) and the usage of pretrained models. Concurrently, little work has been done in terms of both the usage of transformer networks and the way the two modalities, i.e., speech and transcripts, are combined in a single neural network. To address these limitations, first we represent speech signal as an image and employ several pretrained models, with Vision Transformer (ViT) achieving the highest evaluation results. Secondly, we propose multimodal models. More specifically, our introduced models include Gated Multimodal Unit in order to control the influence of each modality towards the final classification and crossmodal attention so as to capture in an effective way the relationships between the two modalities. Extensive experiments conducted on the ADReSS Challenge dataset demonstrate the effectiveness of the proposed models and their superiority over state-of-the-art approaches.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes