LG AIJul 1, 2023

SHARCS: Shared Concept Space for Explainable Multimodal Learning

Gabriele Dominici, Pietro Barbiero, Lucie Charlotte Magister, Pietro Liò, Nikola Simidjievski

arXiv:2307.00316v111.57 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This addresses the need for explainable and trustworthy multimodal learning, enabling domain-expert intervention and cross-modal analysis, though it appears incremental as it builds on existing concept-based methods.

The paper tackles the problem of opaque reasoning in multimodal deep learning by introducing SHARCS, a concept-based approach that maps interpretable concepts from different modalities into a unified space, leading to improved predictive performance and inherent explainability.

Multimodal learning is an essential paradigm for addressing complex real-world problems, where individual data modalities are typically insufficient to accurately solve a given modelling task. While various deep learning approaches have successfully addressed these challenges, their reasoning process is often opaque; limiting the capabilities for a principled explainable cross-modal analysis and any domain-expert intervention. In this paper, we introduce SHARCS (SHARed Concept Space) -- a novel concept-based approach for explainable multimodal learning. SHARCS learns and maps interpretable concepts from different heterogeneous modalities into a single unified concept-manifold, which leads to an intuitive projection of semantically similar cross-modal concepts. We demonstrate that such an approach can lead to inherently explainable task predictions while also improving downstream predictive performance. Moreover, we show that SHARCS can operate and significantly outperform other approaches in practically significant scenarios, such as retrieval of missing modalities and cross-modal explanations. Our approach is model-agnostic and easily applicable to different types (and number) of modalities, thus advancing the development of effective, interpretable, and trustworthy multimodal approaches.

View on arXiv PDF Code

Similar