Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation
This work addresses the problem of inefficient exploration in deep reinforcement learning for dialogue management, which leads to a poor user experience during the learning phase.
This paper explores methods to extract uncertainty estimates from deep Q-networks (DQN) for dialogue policy optimization, aiming to improve exploration efficiency and user experience. It benchmarks various deep Bayesian methods, including Bayes-By-Backprop, dropout, concrete dropout, bootstrapped ensembles, and alpha-divergences, when combined with the DQN algorithm.
In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on epsilon-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such as Gaussian Process SARSA (GPSARSA) estimate uncertainties and are sample efficient, leading to better user experience, but on the expense of a greater computational complexity. This paper examines approaches to extract uncertainty estimates from deep Q-networks (DQN) in the context of dialogue management. We perform an extensive benchmark of deep Bayesian methods to extract uncertainty estimates, namely Bayes-By-Backprop, dropout, its concrete variation, bootstrapped ensemble and alpha-divergences, combining it with DQN algorithm.