S+PAGE: A Speaker and Position-Aware Graph Neural Network Model for Emotion Recognition in Conversation
This work addresses emotion recognition in conversation, a key task for applications like human-computer interaction, but it is incremental as it builds on existing methods by combining Transformer and graph neural networks.
The paper tackles the problem of emotion recognition in conversation by proposing S+PAGE, a model that integrates Transformer and relational graph convolution networks to better model self and inter-speaker contexts, achieving improved performance on benchmark datasets.
Emotion recognition in conversation (ERC) has attracted much attention in recent years for its necessity in widespread applications. Existing ERC methods mostly model the self and inter-speaker context separately, posing a major issue for lacking enough interaction between them. In this paper, we propose a novel Speaker and Position-Aware Graph neural network model for ERC (S+PAGE), which contains three stages to combine the benefits of both Transformer and relational graph convolution network (R-GCN) for better contextual modeling. Firstly, a two-stream conversational Transformer is presented to extract the coarse self and inter-speaker contextual features for each utterance. Then, a speaker and position-aware conversation graph is constructed, and we propose an enhanced R-GCN model, called PAG, to refine the coarse features guided by a relative positional encoding. Finally, both of the features from the former two stages are input into a conditional random field layer to model the emotion transfer.