CVApr 12, 2023

How you feelin'? Learning Emotions and Mental States in Movie Scenes

Dhruv Srivastava, Aditya Kumar Singh, Makarand Tapaswi

arXiv:2304.05634v18.412 citationsh-index: 24Has Code

Originality Incremental advance

AI Analysis

This work addresses emotion understanding for movie story analysis, representing an incremental improvement in multimodal emotion recognition.

The paper tackles the problem of predicting characters' emotions and mental states in movie scenes, proposing EmoTx, a multimodal Transformer architecture that achieves effectiveness over adapted state-of-the-art emotion recognition approaches in experiments on the MovieGraphs dataset.

Movie story analysis requires understanding characters' emotions and mental states. Towards this goal, we formulate emotion understanding as predicting a diverse and multi-label set of emotions at the level of a movie scene and for each character. We propose EmoTx, a multimodal Transformer-based architecture that ingests videos, multiple characters, and dialog utterances to make joint predictions. By leveraging annotations from the MovieGraphs dataset, we aim to predict classic emotions (e.g. happy, angry) and other mental states (e.g. honest, helpful). We conduct experiments on the most frequently occurring 10 and 25 labels, and a mapping that clusters 181 labels to 26. Ablation studies and comparison against adapted state-of-the-art emotion recognition approaches shows the effectiveness of EmoTx. Analyzing EmoTx's self-attention scores reveals that expressive emotions often look at character tokens while other mental states rely on video and dialog cues.

View on arXiv PDF Code

Similar