CLApr 29, 2020

Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis

arXiv:2004.14198v21003 citations
AI Analysis

This addresses the need for better interpretability in multimodal AI systems for applications such as human-computer interaction, though it is incremental as it builds on existing multimodal methods.

The paper tackles the problem of limited interpretability in multimodal learning for human-centric tasks like sentiment analysis and emotion recognition by proposing Multimodal Routing, which dynamically adjusts weights between input modalities per sample to identify importance and enable both global and local interpretation while maintaining competitive performance with state-of-the-art methods.

The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language. Recent multimodal learning with strong performances on human-centric tasks such as sentiment analysis and emotion recognition are often black-box, with very limited interpretability. In this paper we propose Multimodal Routing, which dynamically adjusts weights between input modalities and output representations differently for each input sample. Multimodal routing can identify relative importance of both individual modalities and cross-modality features. Moreover, the weight assignment by routing allows us to interpret modality-prediction relationships not only globally (i.e. general trends over the whole dataset), but also locally for each single input sample, meanwhile keeping competitive performance compared to state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes