CAF-Mamba: Mamba-Based Cross-Modal Adaptive Attention Fusion for Multimodal Depression Detection

Bowen Zhou, Marc-André Fiedler, Ayoub Al-Hamadi

arXiv:2601.21648v21.5h-index: 6Has Code

Originality Incremental advance

AI Analysis

This work addresses depression detection, a critical mental health problem, by improving multimodal fusion methods, though it appears incremental as it builds on existing Mamba-based and attention mechanisms.

The paper tackled multimodal depression detection by proposing CAF-Mamba, a framework that explicitly and implicitly captures cross-modal interactions and dynamically adjusts modality contributions, achieving state-of-the-art performance on benchmark datasets LMVD and D-Vlog.

Depression is a prevalent mental health disorder that severely impairs daily functioning and quality of life. While recent deep learning approaches for depression detection have shown promise, most rely on limited feature types, overlook explicit cross-modal interactions, and employ simple concatenation or static weighting for fusion. To overcome these limitations, we propose CAF-Mamba, a novel Mamba-based cross-modal adaptive attention fusion framework. CAF-Mamba not only captures cross-modal interactions explicitly and implicitly, but also dynamically adjusts modality contributions through a modality-wise attention mechanism, enabling more effective multimodal fusion. Experiments on two in-the-wild benchmark datasets, LMVD and D-Vlog, demonstrate that CAF-Mamba consistently outperforms existing methods and achieves state-of-the-art performance. Our code is available at https://github.com/zbw-zhou/CAF-Mamba.

View on arXiv PDF Code

Similar