Affective Processes: stochastic modelling of temporal context for emotion and facial expression recognition
This work addresses the challenge of improving emotion recognition accuracy for applications in affective computing, though it appears incremental as it builds upon the Neural Processes framework.
The paper tackled the problem of recognizing emotions and facial expressions by addressing the limitations of existing methods in modeling temporal context uncertainty and task-specific dependencies, resulting in consistent improvements over strong baselines and state-of-the-art methods across four databases.
Temporal context is key to the recognition of expressions of emotion. Existing methods, that rely on recurrent or self-attention models to enforce temporal consistency, work on the feature level, ignoring the task-specific temporal dependencies, and fail to model context uncertainty. To alleviate these issues, we build upon the framework of Neural Processes to propose a method for apparent emotion recognition with three key novel components: (a) probabilistic contextual representation with a global latent variable model; (b) temporal context modelling using task-specific predictions in addition to features; and (c) smart temporal context selection. We validate our approach on four databases, two for Valence and Arousal estimation (SEWA and AffWild2), and two for Action Unit intensity estimation (DISFA and BP4D). Results show a consistent improvement over a series of strong baselines as well as over state-of-the-art methods.