CLApr 18, 2018

Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems

arXiv:1804.06512v11145 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency issues in training dialogue systems for task-oriented applications, but it is incremental as it builds on existing reinforcement learning and imitation learning approaches.

The authors tackled the problem of training task-oriented dialogue systems efficiently by addressing the mismatch between offline and online learning stages, proposing a hybrid imitation and reinforcement learning method that learns from human teaching and feedback, resulting in an end-to-end trainable agent that improves task completion through imitation learning and further gains with reinforcement learning.

In this work, we present a hybrid learning method for training task-oriented dialogue systems through online user interactions. Popular methods for learning task-oriented dialogues include applying reinforcement learning with user feedback on supervised pre-training models. Efficiency of such learning method may suffer from the mismatch of dialogue state distribution between offline training and online interactive learning stages. To address this challenge, we propose a hybrid imitation and reinforcement learning method, with which a dialogue agent can effectively learn from its interaction with users by learning from human teaching and feedback. We design a neural network based task-oriented dialogue agent that can be optimized end-to-end with the proposed learning method. Experimental results show that our end-to-end dialogue agent can learn effectively from the mistake it makes via imitation learning from user teaching. Applying reinforcement learning with user feedback after the imitation learning stage further improves the agent's capability in successfully completing a task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes