Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning
This work addresses a critical challenge in healthcare by improving detection accuracy for multilingual speakers with Mild Cognitive Impairment, though it is incremental as it builds on prior single-picture and English-focused methods.
The paper tackled the problem of detecting Mild Cognitive Impairment from multilingual and multi-picture descriptions by proposing a framework with contrastive learning, image modality integration, and a Product of Experts strategy, resulting in a +7.1% increase in Unweighted Average Recall and a +2.9% increase in F1 score compared to a text-only baseline.
Detecting Mild Cognitive Impairment from picture descriptions is critical yet challenging, especially in multilingual and multiple picture settings. Prior work has primarily focused on English speakers describing a single picture (e.g., the 'Cookie Theft'). The TAUKDIAL-2024 challenge expands this scope by introducing multilingual speakers and multiple pictures, which presents new challenges in analyzing picture-dependent content. To address these challenges, we propose a framework with three components: (1) enhancing discriminative representation learning via supervised contrastive learning, (2) involving image modality rather than relying solely on speech and text modalities, and (3) applying a Product of Experts (PoE) strategy to mitigate spurious correlations and overfitting. Our framework improves MCI detection performance, achieving a +7.1% increase in Unweighted Average Recall (UAR) (from 68.1% to 75.2%) and a +2.9% increase in F1 score (from 80.6% to 83.5%) compared to the text unimodal baseline. Notably, the contrastive learning component yields greater gains for the text modality compared to speech. These results highlight our framework's effectiveness in multilingual and multi-picture MCI detection.