CLCVLGAug 11, 2023

Evaluating Picture Description Speech for Dementia Detection using Image-text Alignment

arXiv:2308.07933v110 citationsh-index: 80
Originality Incremental advance
AI Analysis

This work addresses dementia detection for healthcare applications by leveraging multimodal data, representing an incremental improvement over existing methods.

The paper tackles dementia detection by using picture description speech, incorporating both picture and text inputs with image-text alignment models to improve accuracy. It achieves state-of-the-art performance with a detection accuracy of 83.44%, compared to a text-only baseline of 79.91%.

Using picture description speech for dementia detection has been studied for 30 years. Despite the long history, previous models focus on identifying the differences in speech patterns between healthy subjects and patients with dementia but do not utilize the picture information directly. In this paper, we propose the first dementia detection models that take both the picture and the description texts as inputs and incorporate knowledge from large pre-trained image-text alignment models. We observe the difference between dementia and healthy samples in terms of the text's relevance to the picture and the focused area of the picture. We thus consider such a difference could be used to enhance dementia detection accuracy. Specifically, we use the text's relevance to the picture to rank and filter the sentences of the samples. We also identified focused areas of the picture as topics and categorized the sentences according to the focused areas. We propose three advanced models that pre-processed the samples based on their relevance to the picture, sub-image, and focused areas. The evaluation results show that our advanced models, with knowledge of the picture and large image-text alignment models, achieve state-of-the-art performance with the best detection accuracy at 83.44%, which is higher than the text-only baseline model at 79.91%. Lastly, we visualize the sample and picture results to explain the advantages of our models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes