CL LG NEApr 23, 2020

Learning Dialog Policies from Weak Demonstrations

Gabriel Gordon-Hall, Philip John Gorinski, Shay B. Cohen

arXiv:2004.11054v231.21006 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient dialog policy learning for multi-domain systems, though it is incremental as it builds upon existing methods.

The paper tackles the challenge of training dialog managers in large state and action spaces by extending Deep Q-learning from Demonstrations with Reinforced Fine-tune Learning, achieving high success rates in multi-domain dialog systems even with out-of-domain data.

Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user's requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train expert demonstrators. We introduce Reinforced Fine-tune Learning, an extension to DQfD, enabling us to overcome the domain gap between the datasets and the environment. Experiments in a challenging multi-domain dialog system framework validate our approaches, and get high success rates even when trained on out-of-domain data.

View on arXiv PDF

Similar