Multilogue-Net: A Context Aware RNN for Multi-modal Emotion Detection and Sentiment Analysis in Conversation
This work addresses a key challenge in real-world applications like conversational AI by improving multi-modal emotion detection, though it appears incremental as it builds on existing RNN approaches.
The paper tackles the problem of multi-modal emotion detection and sentiment analysis in conversations by proposing a context-aware RNN architecture that addresses limitations in leveraging conversation context, dependencies between speaker and listener emotional states, and inter-modal relationships. The model outperforms state-of-the-art methods on a benchmark dataset across various accuracy and regression metrics.
Sentiment Analysis and Emotion Detection in conversation is key in several real-world applications, with an increase in modalities available aiding a better understanding of the underlying emotions. Multi-modal Emotion Detection and Sentiment Analysis can be particularly useful, as applications will be able to use specific subsets of available modalities, as per the available data. Current systems dealing with Multi-modal functionality fail to leverage and capture - the context of the conversation through all modalities, the dependency between the listener(s) and speaker emotional states, and the relevance and relationship between the available modalities. In this paper, we propose an end to end RNN architecture that attempts to take into account all the mentioned drawbacks. Our proposed model, at the time of writing, out-performs the state of the art on a benchmark dataset on a variety of accuracy and regression metrics.