Modeling emotion in complex stories: the Stanford Emotional Narratives Dataset
This addresses the need for high-quality time-series datasets in affective computing to improve emotion recognition in naturalistic settings, though it is incremental as it builds on existing modeling approaches.
The paper tackles the problem of modeling dynamic emotions in complex narratives by introducing the Stanford Emotional Narratives Dataset (SENDv1), a multimodal video dataset annotated for emotional valence over time, and demonstrates that baseline and state-of-the-art models, such as LSTM and multimodal VRNN, perform comparably to human benchmarks.
Human emotions unfold over time, and more affective computing research has to prioritize capturing this crucial component of real-world affect. Modeling dynamic emotional stimuli requires solving the twin challenges of time-series modeling and of collecting high-quality time-series datasets. We begin by assessing the state-of-the-art in time-series emotion recognition, and we review contemporary time-series approaches in affective computing, including discriminative and generative models. We then introduce the first version of the Stanford Emotional Narratives Dataset (SENDv1): a set of rich, multimodal videos of self-paced, unscripted emotional narratives, annotated for emotional valence over time. The complex narratives and naturalistic expressions in this dataset provide a challenging test for contemporary time-series emotion recognition models. We demonstrate several baseline and state-of-the-art modeling approaches on the SEND, including a Long Short-Term Memory model and a multimodal Variational Recurrent Neural Network, which perform comparably to the human-benchmark. We end by discussing the implications for future research in time-series affective computing.