CVLGOct 20, 2022

SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video

arXiv:2210.11341v15 citationsh-index: 97
Originality Synthesis-oriented
AI Analysis

This work addresses video-based emotion recognition for applications like human-computer interaction, but it is incremental as it analyzes existing pretext tasks and loss combinations.

The paper tackled apparent emotional reaction recognition from video using self-supervised learning, achieving state-of-the-art performance with continuous annotations.

This work focuses on the apparent emotional reaction recognition (AERR) from the video-only input, conducted in a self-supervised fashion. The network is first pre-trained on different self-supervised pretext tasks and later fine-tuned on the downstream target task. Self-supervised learning facilitates the use of pre-trained architectures and larger datasets that might be deemed unfit for the target task and yet might be useful to learn informative representations and hence provide useful initializations for further fine-tuning on smaller more suitable data. Our presented contribution is two-fold: (1) an analysis of different state-of-the-art (SOTA) pretext tasks for the video-only apparent emotional reaction recognition architecture, and (2) an analysis of various combinations of the regression and classification losses that are likely to improve the performance further. Together these two contributions result in the current state-of-the-art performance for the video-only spontaneous apparent emotional reaction recognition with continuous annotations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes