AIJul 28, 2021

Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal Sentiment Analysis

Wei Han, Hui Chen, Alexander Gelbukh, Amir Zadeh, Louis-philippe Morency, Soujanya Poria

arXiv:2107.13669v224.3256 citationsHas Code

Originality Highly original

AI Analysis

This work addresses a key bottleneck in multimodal sentiment analysis for researchers and practitioners by improving fusion schemes to better handle modality interactions.

The paper tackles the problem of multimodal sentiment analysis by addressing the competition between independence and relevance among modalities, proposing a Bi-Bimodal Fusion Network (BBFN) that simultaneously performs fusion and separation on pairwise modality representations, achieving significant performance improvements over state-of-the-art methods on three datasets (CMU-MOSI, CMU-MOSEI, and UR-FUNNY).

Multimodal sentiment analysis aims to extract and integrate semantic information collected from multiple modalities to recognize the expressed emotions and sentiment in multimodal data. This research area's major concern lies in developing an extraordinary fusion scheme that can extract and integrate key information from various modalities. However, one issue that may restrict previous work to achieve a higher level is the lack of proper modeling for the dynamics of the competition between the independence and relevance among modalities, which could deteriorate fusion outcomes by causing the collapse of modality-specific feature space or introducing extra noise. To mitigate this, we propose the Bi-Bimodal Fusion Network (BBFN), a novel end-to-end network that performs fusion (relevance increment) and separation (difference increment) on pairwise modality representations. The two parts are trained simultaneously such that the combat between them is simulated. The model takes two bimodal pairs as input due to the known information imbalance among modalities. In addition, we leverage a gated control mechanism in the Transformer architecture to further improve the final output. Experimental results on three datasets (CMU-MOSI, CMU-MOSEI, and UR-FUNNY) verifies that our model significantly outperforms the SOTA. The implementation of this work is available at https://github.com/declare-lab/multimodal-deep-learning.

View on arXiv PDF Code

Similar