LGMay 7, 2024

Interpretable Tensor Fusion

Saurabh Varshneya, Antoine Ledent, Philipp Liznerski, Andriy Balinskyy, Purvanshi Mehta, Waleed Mustafa, Marius Kloft

arXiv:2405.04671v14.65 citationsh-index: 7IJCAI

Originality Incremental advance

AI Analysis

This addresses the need for interpretable multimodal learning in practical applications with diverse data types, representing an incremental improvement over existing interpretable methods.

The paper tackles the problem of multimodal learning by introducing interpretable tensor fusion (InTense), a method that learns multimodal data representations and their interpretable fusion, outperforming existing state-of-the-art approaches in accuracy and interpretability on six real-world datasets.

Conventional machine learning methods are predominantly designed to predict outcomes based on a single data type. However, practical applications may encompass data of diverse types, such as text, images, and audio. We introduce interpretable tensor fusion (InTense), a multimodal learning method for training neural networks to simultaneously learn multimodal data representations and their interpretable fusion. InTense can separately capture both linear combinations and multiplicative interactions of diverse data types, thereby disentangling higher-order interactions from the individual effects of each modality. InTense provides interpretability out of the box by assigning relevance scores to modalities and their associations. The approach is theoretically grounded and yields meaningful relevance scores on multiple synthetic and real-world datasets. Experiments on six real-world datasets show that InTense outperforms existing state-of-the-art multimodal interpretable approaches in terms of accuracy and interpretability.

View on arXiv PDF

Similar