CL AI LGOct 31, 2017

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Yun-Nung Chen, Kam-Fai Wong

arXiv:1710.11277v25.974 citationsh-index: 91

Originality Incremental advance

AI Analysis

This addresses the challenge of slow policy learning for developers of task-oriented dialogue systems, though it appears incremental as it builds on existing A2C and GAN concepts.

The paper tackles the problem of inefficient dialogue policy learning in task-completion systems by introducing adversarial advantage actor-critic (Adversarial A2C), which accelerates policy exploration, as shown in experiments on a movie-ticket booking domain.

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.

View on arXiv PDF

Similar