InferEM: Inferring the Speaker's Intention for Empathetic Dialogue Generation
This work addresses the challenge of generating more empathetic responses in dialogue systems, which is important for applications like mental health support or customer service, though it is incremental as it builds on existing methods by adding intention modeling.
The paper tackles the problem of generating empathetic dialogue responses by focusing on the speaker's intention, which is often neglected in existing methods that encode the entire dialogue history. The proposed InferEM model separately encodes the last utterance to capture intention and uses multi-task learning, showing improved empathetic expression in experiments.
Current approaches to empathetic response generation typically encode the entire dialogue history directly and put the output into a decoder to generate friendly feedback. These methods focus on modelling contextual information but neglect capturing the direct intention of the speaker. We argue that the last utterance in the dialogue empirically conveys the intention of the speaker. Consequently, we propose a novel model named InferEM for empathetic response generation. We separately encode the last utterance and fuse it with the entire dialogue through the multi-head attention based intention fusion module to capture the speaker's intention. Besides, we utilize previous utterances to predict the last utterance, which simulates human's psychology to guess what the interlocutor may speak in advance. To balance the optimizing rates of the utterance prediction and response generation, a multi-task learning strategy is designed for InferEM. Experimental results demonstrate the plausibility and validity of InferEM in improving empathetic expression.