CV AIDec 13, 2022

Improving Depression estimation from facial videos with face alignment, training optimization and scheduling

Manuel Lage Cañellas, Constantino Álvarez Casado, Le Nguyen, Miguel Bordallo López

arXiv:2212.06400v12.65 citationsh-index: 7

Originality Synthesis-oriented

AI Analysis

This work addresses depression estimation for mental health applications, but it is incremental as it focuses on improving preprocessing and training techniques rather than introducing new architectures.

The paper tackled the problem of estimating depression from facial videos by enhancing simple ResNet-50 models with face alignment, data augmentation, and training optimization, achieving results comparable to sophisticated spatio-temporal models and outperforming state-of-the-art methods with score-level fusion.

Deep learning models have shown promising results in recognizing depressive states using video-based facial expressions. While successful models typically leverage using 3D-CNNs or video distillation techniques, the different use of pretraining, data augmentation, preprocessing, and optimization techniques across experiments makes it difficult to make fair architectural comparisons. We propose instead to enhance two simple models based on ResNet-50 that use only static spatial information by using two specific face alignment methods and improved data augmentation, optimization, and scheduling techniques. Our extensive experiments on benchmark datasets obtain similar results to sophisticated spatio-temporal models for single streams, while the score-level fusion of two different streams outperforms state-of-the-art methods. Our findings suggest that specific modifications in the preprocessing and training process result in noticeable differences in the performance of the models and could hide the actual originally attributed to the use of different neural network architectures.

View on arXiv PDF

Similar