ML CL NENov 29, 2017

A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management

Iñigo Casanueva, Paweł Budzianowski, Pei-Hao Su, Nikola Mrkšić, Tsung-Hsien Wen, Stefan Ultes, Lina Rojas-Barahona, Steve Young, Milica Gašić

arXiv:1711.11023v215.554 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of fair model comparison and reproducibility for researchers in dialogue systems, though it is incremental as it builds on existing tools and methods.

The paper tackles the lack of a common benchmarking framework for comparing reinforcement learning models in task-oriented dialogue management by proposing a set of challenging simulated environments and providing baseline comparisons of algorithms like DQN, A2C, Natural Actor-Critic, and GP-SARSA.

Dialogue assistants are rapidly becoming an indispensable daily aid. To avoid the significant effort needed to hand-craft the required dialogue flow, the Dialogue Management (DM) module can be cast as a continuous Markov Decision Process (MDP) and trained through Reinforcement Learning (RL). Several RL models have been investigated over recent years. However, the lack of a common benchmarking framework makes it difficult to perform a fair comparison between different models and their capability to generalise to different environments. Therefore, this paper proposes a set of challenging simulated environments for dialogue model development and evaluation. To provide some baselines, we investigate a number of representative parametric algorithms, namely deep reinforcement learning algorithms - DQN, A2C and Natural Actor-Critic and compare them to a non-parametric model, GP-SARSA. Both the environments and policy models are implemented using the publicly available PyDial toolkit and released on-line, in order to establish a testbed framework for further experiments and to facilitate experimental reproducibility.

View on arXiv PDF

Similar