AILGMLDec 18, 2016

Sample-efficient Deep Reinforcement Learning for Dialog Control

arXiv:1612.06000v121 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of high data requirements for training dialog policies in reinforcement learning, which is incremental as it builds on existing methods to improve efficiency.

The paper tackled the sample inefficiency of policy gradient methods in deep reinforcement learning for dialog control by introducing three methods that incorporate a value-predicting RNN and experience replay, reducing the required number of dialogs by about one-third compared to standard approaches.

Representing a dialog policy as a recurrent neural network (RNN) is attractive because it handles partial observability, infers a latent representation of state, and can be optimized with supervised learning (SL) or reinforcement learning (RL). For RL, a policy gradient approach is natural, but is sample inefficient. In this paper, we present 3 methods for reducing the number of dialogs required to optimize an RNN-based dialog policy with RL. The key idea is to maintain a second RNN which predicts the value of the current policy, and to apply experience replay to both networks. On two tasks, these methods reduce the number of dialogs/episodes required by about a third, vs. standard policy gradient methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes