CLDec 29, 2020

A Hierarchical Transformer with Speaker Modeling for Emotion Recognition in Conversation

Jiangnan Li, Zheng Lin, Peng Fu, Qingyi Si, Weiping Wang

arXiv:2012.14781v11.019 citationsHas Code

Originality Incremental advance

AI Analysis

This work provides an incremental improvement for the task of emotion recognition in conversational AI systems by offering a more computationally efficient way to model speaker interactions.

This paper addresses the computational expense and limited context of current speaker interaction models in Emotion Recognition in Conversation (ERC) by simplifying speaker dependencies into binary Intra-Speaker and Inter-Speaker categories. The authors design a hierarchical Transformer with three distinct masking strategies to model these dependencies, achieving improved performance on two ERC datasets.

Emotion Recognition in Conversation (ERC) is a more challenging task than conventional text emotion recognition. It can be regarded as a personalized and interactive emotion recognition task, which is supposed to consider not only the semantic information of text but also the influences from speakers. The current method models speakers' interactions by building a relation between every two speakers. However, this fine-grained but complicated modeling is computationally expensive, hard to extend, and can only consider local context. To address this problem, we simplify the complicated modeling to a binary version: Intra-Speaker and Inter-Speaker dependencies, without identifying every unique speaker for the targeted speaker. To better achieve the simplified interaction modeling of speakers in Transformer, which shows excellent ability to settle long-distance dependency, we design three types of masks and respectively utilize them in three independent Transformer blocks. The designed masks respectively model the conventional context modeling, Intra-Speaker dependency, and Inter-Speaker dependency. Furthermore, different speaker-aware information extracted by Transformer blocks diversely contributes to the prediction, and therefore we utilize the attention mechanism to automatically weight them. Experiments on two ERC datasets indicate that our model is efficacious to achieve better performance.

View on arXiv PDF Code

Similar