CLNov 2, 2018

Sequence Generation with Guider Network

arXiv:1811.00696v15 citations
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in RL for sequence generation, offering an incremental improvement for researchers and practitioners in natural language processing.

The paper tackles the sparse-reward problem in RL-based sequence generation, which causes semantic inconsistency, by introducing a guider network to model the environment and provide intermediate rewards, resulting in improved performance on unconditional and conditional tasks.

Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparse-reward problem in the RL training process, in which a scalar guiding signal is often only available after an entire sequence has been generated. This type of sparse reward tends to ignore the global structural information of a sequence, causing generation of sequences that are semantically inconsistent. In this paper, we present a model-based RL approach to overcome this issue. Specifically, we propose a novel guider network to model the sequence-generation environment, which can assist next-word prediction and provide intermediate rewards for generator optimization. Extensive experiments show that the proposed method leads to improved performance for both unconditional and conditional sequence-generation tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes