CLSep 28, 2021

"How Robust r u?": Evaluating Task-Oriented Dialogue Systems on Spoken Conversations

Seokhwan Kim, Yang Liu, Di Jin, Alexandros Papangelis, Karthik Gopalakrishnan, Behnam Hedayatnia, Dilek Hakkani-Tur

arXiv:2109.13489v14.344 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the gap in evaluating dialogue systems for practical spoken applications, though it is incremental as it builds on existing methods with new data.

The authors tackled the problem of evaluating task-oriented dialogue systems on spoken conversations by creating a new benchmark dataset, finding that existing state-of-the-art models trained on written data perform poorly on spoken data, with improvements observed when using n-best speech recognition hypotheses.

Most prior work in dialogue modeling has been on written conversations mostly because of existing data sets. However, written dialogues are not sufficient to fully capture the nature of spoken conversations as well as the potential speech recognition errors in practical spoken dialogue systems. This work presents a new benchmark on spoken task-oriented conversations, which is intended to study multi-domain dialogue state tracking and knowledge-grounded dialogue modeling. We report that the existing state-of-the-art models trained on written conversations are not performing well on our spoken data, as expected. Furthermore, we observe improvements in task performances when leveraging n-best speech recognition hypotheses such as by combining predictions based on individual hypotheses. Our data set enables speech-based benchmarking of task-oriented dialogue systems.

View on arXiv PDF

Similar