AICLLGFeb 3, 2018

Multi-attention Recurrent Network for Human Communication Comprehension

arXiv:1802.00923v1562 citations
Originality Highly original
AI Analysis

This addresses the problem of improving AI's ability to understand complex multimodal signals in human communication, representing a novel method for a known bottleneck.

The paper tackles the challenge of AI comprehending multimodal human communication by proposing the Multi-attention Recurrent Network (MARN), which achieves state-of-the-art performance on six datasets for tasks like sentiment analysis and emotion recognition.

Human face-to-face communication is a complex multimodal signal. We use words (language modality), gestures (vision modality) and changes in tone (acoustic modality) to convey our intentions. Humans easily process and understand face-to-face communication, however, comprehending this form of communication remains a significant challenge for Artificial Intelligence (AI). AI must understand each modality and the interactions between them that shape human communication. In this paper, we present a novel neural architecture for understanding human communication called the Multi-attention Recurrent Network (MARN). The main strength of our model comes from discovering interactions between modalities through time using a neural component called the Multi-attention Block (MAB) and storing them in the hybrid memory of a recurrent component called the Long-short Term Hybrid Memory (LSTHM). We perform extensive comparisons on six publicly available datasets for multimodal sentiment analysis, speaker trait recognition and emotion recognition. MARN shows state-of-the-art performance on all the datasets.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes