Diversifying Neural Dialogue Generation via Negative Distillation
This work addresses the generic response issue in neural dialogue generation, which limits practical applications, but it is incremental as it builds on existing negative training approaches.
The paper tackles the generic response problem in generative dialogue models by proposing negative distillation, a novel negative training paradigm that uses a negative teacher model to produce query-wise generic responses and requires the student model to maximize distance with multi-level negative knowledge, resulting in significant outperformance over previous methods.
Generative dialogue models suffer badly from the generic response problem, limiting their applications to a few toy scenarios. Recently, an interesting approach, namely negative training, has been proposed to alleviate this problem by reminding the model not to generate high-frequency responses during training. However, its performance is hindered by two issues, ignoring low-frequency but generic responses and bringing low-frequency but meaningless responses. In this paper, we propose a novel negative training paradigm, called negative distillation, to keep the model away from the undesirable generic responses while avoiding the above problems. First, we introduce a negative teacher model that can produce query-wise generic responses, and then the student model is required to maximize the distance with multi-level negative knowledge. Empirical results show that our method outperforms previous negative training methods significantly.