Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization
This work addresses the problem of unstable sentiment prediction due to rapid emotional fluctuations for researchers in multimodal sentiment analysis, but it appears incremental as it builds on existing regularization and fusion techniques.
The paper tackled the challenge of achieving consistent sentiment representation across diverse modalities in multimodal sentiment analysis by proposing a dual enhancement strategy that simultaneously improves temporal and modality dimensions, validated on three standard public datasets.
Achieving consistent sentiment representation across diverse modalities remains a key challenge in multimodal sentiment analysis. However, rapid emotional fluctuations over time often introduce instability, leading to compromised prediction performance. To address this challenge, we propose a robust sentiment representation dual enhancement strategy that simultaneously enhances the temporal and modality dimensions, guided by targeted mechanisms in both forward and backward propagation. Specifically, in the modality dimension, we introduce a modality invariant fusion mechanism that fosters stable cross-modal representations, which aim to capture the common and stable representations shared across different modalities. In the temporal dimension, we impose a specialized sequential variation regularization term that regulates the model's learning trajectory during backward propagation, which is essentially total variation regularization degenerated into one-dimensional linear differences. Extensive experiments on three standard public datasets validate the effectiveness of our proposed approach.