Modality-based Factorization for Multimodal Fusion
This work addresses the challenge of interpreting and optimizing multimodal fusion for researchers and practitioners in AI, though it appears incremental as it builds on existing tensor factorization techniques.
The paper tackled the problem of understanding and modulating the relative contribution of each modality in multimodal inference tasks by proposing Modality-based Redundancy Reduction Fusion (MRRF), which achieved a 1% to 4% improvement on evaluation measures compared to state-of-the-art methods across sentiment analysis, personality trait recognition, and emotion recognition datasets.
We propose a novel method, Modality-based Redundancy Reduction Fusion (MRRF), for understanding and modulating the relative contribution of each modality in multimodal inference tasks. This is achieved by obtaining an $(M+1)$-way tensor to consider the high-order relationships between $M$ modalities and the output layer of a neural network model. Applying a modality-based tensor factorization method, which adopts different factors for different modalities, results in removing information present in a modality that can be compensated by other modalities, with respect to model outputs. This helps to understand the relative utility of information in each modality. In addition it leads to a less complicated model with less parameters and therefore could be applied as a regularizer avoiding overfitting. We have applied this method to three different multimodal datasets in sentiment analysis, personality trait recognition, and emotion recognition. We are able to recognize relationships and relative importance of different modalities in these tasks and achieves a 1\% to 4\% improvement on several evaluation measures compared to the state-of-the-art for all three tasks.