Multimodal Sentiment Analysis with Missing Modality: A Knowledge-Transfer Approach
This addresses a practical limitation in real-world sentiment analysis where not all modalities are available, though it is incremental as it builds on existing multimodal frameworks.
The paper tackles the problem of missing modalities in multimodal sentiment analysis by proposing a knowledge-transfer network to reconstruct missing audio and a cross-modality attention mechanism for prediction, achieving significant improvements over baselines and comparable results to methods with full modality supervision on three datasets.
Multimodal sentiment analysis aims to identify the emotions expressed by individuals through visual, language, and acoustic cues. However, most of the existing research efforts assume that all modalities are available during both training and testing, making their algorithms susceptible to the missing modality scenario. In this paper, we propose a novel knowledge-transfer network to translate between different modalities to reconstruct the missing audio modalities. Moreover, we develop a cross-modality attention mechanism to retain the maximal information of the reconstructed and observed modalities for sentiment prediction. Extensive experiments on three publicly available datasets demonstrate significant improvements over baselines and achieve comparable results to the previous methods with complete multi-modality supervision.