CLSep 18, 2017

Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models

arXiv:1709.06136v111.1104 citations

Originality Incremental advance

AI Analysis

This work addresses a key bottleneck in task-oriented dialog systems for developers, offering an incremental improvement over existing RL approaches by eliminating the need for pre-built user simulators.

The paper tackles the challenge of building reliable user simulators for dialog policy learning by proposing a deep reinforcement learning framework that jointly optimizes a dialog agent and a user simulator through iterative dialog simulations, resulting in improved task success rates and rewards compared to baseline methods.

In this paper, we present a deep reinforcement learning (RL) framework for iterative dialog policy optimization in end-to-end task-oriented dialog systems. Popular approaches in learning dialog policy with RL include letting a dialog agent to learn against a user simulator. Building a reliable user simulator, however, is not trivial, often as difficult as building a good dialog agent. We address this challenge by jointly optimizing the dialog agent and the user simulator with deep RL by simulating dialogs between the two agents. We first bootstrap a basic dialog agent and a basic user simulator by learning directly from dialog corpora with supervised training. We then improve them further by letting the two agents to conduct task-oriented dialogs and iteratively optimizing their policies with deep RL. Both the dialog agent and the user simulator are designed with neural network models that can be trained end-to-end. Our experiment results show that the proposed method leads to promising improvements on task success rate and total task reward comparing to supervised training and single-agent RL training baseline models.

View on arXiv PDF

Similar