LG NE MLAug 17, 2016

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

Zachary C. Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng

arXiv:1608.05081v423.879 citations

Originality Incremental advance

AI Analysis

This work addresses exploration inefficiencies for researchers and practitioners in reinforcement learning applied to dialogue systems, representing an incremental improvement over existing methods.

The paper tackles the problem of inefficient exploration in deep Q-learning for task-oriented dialogue systems by introducing an algorithm using Thompson sampling with Bayes-by-Backprop neural networks, resulting in significantly faster learning compared to common exploration strategies.

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems. Our agents explore via Thompson sampling, drawing Monte Carlo samples from a Bayes-by-Backprop neural network. Our algorithm learns much faster than common exploration strategies such as $ε$-greedy, Boltzmann, bootstrapping, and intrinsic-reward-based ones. Additionally, we show that spiking the replay buffer with experiences from just a few successful episodes can make Q-learning feasible when it might otherwise fail.

View on arXiv PDF

Similar