CVJan 6, 2025

MVP: Multimodal Emotion Recognition based on Video and Physiological Signals

arXiv:2501.03103v15 citationsh-index: 30ECCV Workshops
Originality Incremental advance
AI Analysis

This work addresses emotion recognition for applications like human-computer interaction, but it is incremental as it builds on existing multimodal fusion techniques.

The paper tackles emotion recognition by fusing video and physiological signals using a deep learning architecture with attention, achieving performance improvements over previous methods.

Human emotions entail a complex set of behavioral, physiological and cognitive changes. Current state-of-the-art models fuse the behavioral and physiological components using classic machine learning, rather than recent deep learning techniques. We propose to fill this gap, designing the Multimodal for Video and Physio (MVP) architecture, streamlined to fuse video and physiological signals. Differently then others approaches, MVP exploits the benefits of attention to enable the use of long input sequences (1-2 minutes). We have studied video and physiological backbones for inputting long sequences and evaluated our method with respect to the state-of-the-art. Our results show that MVP outperforms former methods for emotion recognition based on facial videos, EDA, and ECG/PPG.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes