Variational Fusion for Multimodal Sentiment Analysis
This addresses the issue of multimodal representation fidelity in tasks like sentiment analysis, but it appears incremental as it builds on existing variational methods.
The paper tackled the problem of information loss in multimodal fusion for sentiment analysis by proposing a variational autoencoder-based approach, which outperformed state-of-the-art methods by a significant margin on several datasets.
Multimodal fusion is considered a key step in multimodal tasks such as sentiment analysis, emotion detection, question answering, and others. Most of the recent work on multimodal fusion does not guarantee the fidelity of the multimodal representation with respect to the unimodal representations. In this paper, we propose a variational autoencoder-based approach for modality fusion that minimizes information loss between unimodal and multimodal representations. We empirically show that this method outperforms the state-of-the-art methods by a significant margin on several popular datasets.