CLMay 22, 2018

Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment

Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic

arXiv:1805.08660v132.31114 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of improving emotion recognition accuracy for applications in human-computer interaction, though it appears incremental as it builds on existing multimodal fusion methods.

The paper tackled the challenge of multimodal affective computing by introducing a hierarchical architecture with attention and word-level fusion for sentiment and emotion classification from text and audio, achieving state-of-the-art performance on published datasets.

Multimodal affective computing, learning to recognize and interpret human affects and subjective information from multiple data sources, is still challenging because: (i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract level, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utter-ance-level sentiment and emotion from text and audio data. Our introduced model outperforms the state-of-the-art approaches on published datasets and we demonstrated that our model is able to visualize and interpret the synchronized attention over modalities.

View on arXiv PDF

Similar