CLMay 22, 2018

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment

arXiv:1805.08660v11114 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving emotion recognition accuracy for applications in human-computer interaction, though it appears incremental as it builds on existing multimodal fusion methods.

The paper tackled the challenge of multimodal affective computing by introducing a hierarchical architecture with attention and word-level fusion for sentiment and emotion classification from text and audio, achieving state-of-the-art performance on published datasets.

Multimodal affective computing, learning to recognize and interpret human affects and subjective information from multiple data sources, is still challenging because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract level, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utter-ance-level sentiment and emotion from text and audio data. Our introduced model outperforms the state-of-the-art approaches on published datasets and we demonstrated that our model is able to visualize and interpret the synchronized attention over modalities.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes