Semi-IIN: Semi-supervised Intra-inter modal Interaction Learning Network for Multimodal Sentiment Analysis
This work addresses multimodal sentiment analysis, a domain-specific problem, with incremental improvements in handling unlabeled data and interaction selection.
The paper tackles the problem of high annotation costs and label ambiguity in multimodal sentiment analysis by proposing Semi-IIN, a semi-supervised network that dynamically selects intra- and inter-modal interactions, achieving new state-of-the-art results on MOSI and MOSEI datasets.
Despite multimodal sentiment analysis being a fertile research ground that merits further investigation, current approaches take up high annotation cost and suffer from label ambiguity, non-amicable to high-quality labeled data acquisition. Furthermore, choosing the right interactions is essential because the significance of intra- or inter-modal interactions can differ among various samples. To this end, we propose Semi-IIN, a Semi-supervised Intra-inter modal Interaction learning Network for multimodal sentiment analysis. Semi-IIN integrates masked attention and gating mechanisms, enabling effective dynamic selection after independently capturing intra- and inter-modal interactive information. Combined with the self-training approach, Semi-IIN fully utilizes the knowledge learned from unlabeled data. Experimental results on two public datasets, MOSI and MOSEI, demonstrate the effectiveness of Semi-IIN, establishing a new state-of-the-art on several metrics. Code is available at https://github.com/flow-ljh/Semi-IIN.