CLAILGOct 31, 2017

Adversarial Advantage Actor-Critic Model for Task-Completion Dialogue Policy Learning

arXiv:1710.11277v274 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of slow policy learning for developers of task-oriented dialogue systems, though it appears incremental as it builds on existing A2C and GAN concepts.

The paper tackles the problem of inefficient dialogue policy learning in task-completion systems by introducing adversarial advantage actor-critic (Adversarial A2C), which accelerates policy exploration, as shown in experiments on a movie-ticket booking domain.

This paper presents a new method --- adversarial advantage actor-critic (Adversarial A2C), which significantly improves the efficiency of dialogue policy learning in task-completion dialogue systems. Inspired by generative adversarial networks (GAN), we train a discriminator to differentiate responses/actions generated by dialogue agents from responses/actions by experts. Then, we incorporate the discriminator as another critic into the advantage actor-critic (A2C) framework, to encourage the dialogue agent to explore state-action within the regions where the agent takes actions similar to those of the experts. Experimental results in a movie-ticket booking domain show that the proposed Adversarial A2C can accelerate policy exploration efficiently.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes