Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild?
This is a position paper identifying limitations in task-oriented dialogue systems for real-world applications, proposing a new direction for scalability.
The paper argues that current task-oriented dialogue models, based on slot-filling frameworks, are limited in automating real-world scenarios despite success on benchmarks, and explores the WebTOD framework as an alternative using large-scale language models to understand web/mobile interfaces.
Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way to cover the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.