The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues
This addresses a methodological issue in training task-oriented dialogue systems, showing incremental insights into balancing task performance and language quality.
The study found that selecting models based on task success in referential dialogue games hinders learning of richer language, as language proficiency develops slower than task mastery, and demonstrated that improving language quality could boost task accuracy in GuessWhat by better handling infrequent words.
When training a model on referential dialogue guessing games, the best model is usually chosen based on its task success. We show that in the popular end-to-end approach, this choice prevents the model from learning to generate linguistically richer dialogues, since the acquisition of language proficiency takes longer than learning the guessing task. By comparing models playing different games (GuessWhat, GuessWhich, and Mutual Friends), we show that this discrepancy is model- and task-agnostic. We investigate whether and when better language quality could lead to higher task success. We show that in GuessWhat, models could increase their accuracy if they learn to ground, encode, and decode also words that do not occur frequently in the training set.