Enhancing Multimodal Affective Analysis with Learned Live Comment Features
This work addresses the challenge of rare live comment data for researchers and practitioners in affective computing, though it is incremental as it builds on existing multimodal approaches.
The paper tackles the problem of limited live comment data for multimodal affective analysis by constructing the LCAffect dataset and using contrastive learning to train a video encoder for synthetic live comment features, resulting in significant performance improvements over state-of-the-art methods in tasks like sentiment and emotion recognition.
Live comments, also known as Danmaku, are user-generated messages that are synchronized with video content. These comments overlay directly onto streaming videos, capturing viewer emotions and reactions in real-time. While prior work has leveraged live comments in affective analysis, its use has been limited due to the relative rarity of live comments across different video platforms. To address this, we first construct the Live Comment for Affective Analysis (LCAffect) dataset which contains live comments for English and Chinese videos spanning diverse genres that elicit a wide spectrum of emotions. Then, using this dataset, we use contrastive learning to train a video encoder to produce synthetic live comment features for enhanced multimodal affective content analysis. Through comprehensive experimentation on a wide range of affective analysis tasks (sentiment, emotion recognition, and sarcasm detection) in both English and Chinese, we demonstrate that these synthetic live comment features significantly improve performance over state-of-the-art methods.