CLLGApr 19, 2021

Alexa Conversations: An Extensible Data-driven Approach for Building Task-oriented Dialogue Systems

arXiv:2104.09088v1735 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of creating robust dialogue systems for developers without extensive annotations or data, though it is incremental in improving data efficiency.

The paper tackles the problem of building scalable and data-efficient task-oriented dialogue systems by introducing Alexa Conversations, which uses a novel dialogue simulator to generate training data from a few seed dialogues and API specifications, reducing developer burden and achieving over 50% improvement in turn-level action signature prediction accuracy.

Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomena like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task and show that the dialogue simulator is an essential component of the system that leads to over $50\%$ improvement in turn-level action signature prediction accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes