Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies
This is an incremental tutorial aimed at researchers and practitioners in AI and human-computer interaction, summarizing existing knowledge without introducing new methods.
This tutorial provides an overview of multi-modal emotion recognition (MER), covering fundamentals like emotion representation models, annotation strategies, and computational tasks, and discusses methodologies including representation learning, feature fusion, classifier optimization, and domain adaptation for real-world applications.
Humans are emotional creatures. Multiple modalities are often involved when we express emotions, whether we do so explicitly (e.g., facial expression, speech) or implicitly (e.g., text, image). Enabling machines to have emotional intelligence, i.e., recognizing, interpreting, processing, and simulating emotions, is becoming increasingly important. In this tutorial, we discuss several key aspects of multi-modal emotion recognition (MER). We begin with a brief introduction on widely used emotion representation models and affective modalities. We then summarize existing emotion annotation strategies and corresponding computational tasks, followed by the description of main challenges in MER. Furthermore, we present some representative approaches on representation learning of each affective modality, feature fusion of different affective modalities, classifier optimization for MER, and domain adaptation for MER. Finally, we outline several real-world applications and discuss some future directions.