Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances
This addresses a critical limitation in DST benchmark datasets for real-world conversational AI, though it is incremental as it focuses on dataset construction rather than model innovation.
The study found that current dialogue state tracking (DST) models fail when users change their minds in conversations, with performance significantly degenerating even in simple turnback scenarios, but this can be recovered by explicitly including such scenarios in training data.
The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds from time to time during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no one changes their mind during a conversation. As the main question inspiring the present study, "Are current benchmark datasets sufficiently diverse to handle casual conversations in which one changes their mind after a certain topic is over?" We found that the answer is "No" because DST models cannot refer to previous user preferences when template-based turnback utterances are injected into the dataset. Even in the the simplest mind-changing (turnback) scenario, the performance of DST models significantly degenerated. However, we found that this performance degeneration can be recovered when the turnback scenarios are explicitly designed in the training set, implying that the problem is not with the DST models but rather with the construction of the benchmark dataset.