CVApr 11

Context Matters: Vision-Based Depression Detection Comparing Classical and Deep Approaches

Maneesh Bilalpur, Saurabh Hinduja, Sonish Sivarajkumar, Nicholas Allen, Yanshan Wang, Itir Onal Ertugrul, Jeffrey F. Cohn

arXiv:2604.1034416.6

AI Analysis

For researchers in affective computing, this work provides a direct comparison showing that classical interpretable methods can outperform deep learning in accuracy and fairness for depression detection, though generalizability across contexts remains limited.

The study compared classical (handcrafted features + SVM) and deep (learnt features + MLP) approaches for vision-based depression detection across two contexts (mother-child interactions and patient-clinician interviews). The classical approach achieved higher accuracy and was significantly fairer in the patient-clinician context, while cross-context generalizability was modest for both.

The classical approach to detecting depression from vision emphasizes interpretable features, such as facial expression, and classifiers such as the Support Vector Machine (SVM). With the advent of deep learning, there has been a shift in feature representations and classification approaches. Contemporary approaches use learnt features from general-purpose vision models such as VGGNet to train machine learning models. Little is known about how classical and deep approaches compare in depression detection with respect to accuracy, fairness, and generalizability, especially across contexts. To address these questions, we compared classical and deep approaches to the detection of depression in the visual modality in two different contexts: Mother-child interactions in the TPOT database and patient-clinician interviews in the Pitt database. In the former, depression was operationalized as a history of depression per the DSM and current or recent clinically significant symptoms. In the latter, all participants met initial criteria for depression per DSM, and depression was reassessed over the course of treatment. The classical approach included handcrafted features with SVM classifiers. Learnt features were turn-level embeddings from the FMAE-IAT that were combined with Multi-Layer Perceptron classifiers. The classical approach achieved higher accuracy in both contexts. It was also significantly fairer than the deep approach in the patient-clinician context. Cross-context generalizability was modest at best for both approaches, which suggests that depression may be context-specific.

View on arXiv PDF

Similar