Exploring Machine Learning and Language Models for Multimodal Depression Detection
This work addresses depression detection for mental health applications, but it is incremental as it applies existing methods to a new challenge dataset.
The paper tackled multimodal depression detection by comparing XGBoost, transformers, and LLMs on audio, video, and text features, highlighting their strengths and limitations in capturing depression signals.
This paper presents our approach to the first Multimodal Personality-Aware Depression Detection Challenge, focusing on multimodal depression detection using machine learning and deep learning models. We explore and compare the performance of XGBoost, transformer-based architectures, and large language models (LLMs) on audio, video, and text features. Our results highlight the strengths and limitations of each type of model in capturing depression-related signals across modalities, offering insights into effective multimodal representation strategies for mental health prediction.