CANAMRF: An Attention-Based Model for Multimodal Depression Detection
This work addresses the challenge of accurately detecting depression using multimodal data, which is important for mental health applications, but it appears incremental as it builds on existing attention and fusion techniques.
The paper tackles the problem of multimodal depression detection by proposing CANAMRF, an attention-based model that addresses the issue of treating all modalities equally in previous methods, achieving state-of-the-art performance on two benchmark datasets.
Multimodal depression detection is an important research topic that aims to predict human mental states using multimodal data. Previous methods treat different modalities equally and fuse each modality by naïve mathematical operations without measuring the relative importance between them, which cannot obtain well-performed multimodal representations for downstream depression tasks. In order to tackle the aforementioned concern, we present a Cross-modal Attention Network with Adaptive Multi-modal Recurrent Fusion (CANAMRF) for multimodal depression detection. CANAMRF is constructed by a multimodal feature extractor, an Adaptive Multimodal Recurrent Fusion module, and a Hybrid Attention Module. Through experimentation on two benchmark datasets, CANAMRF demonstrates state-of-the-art performance, underscoring the effectiveness of our proposed approach.