Discourse-Aware Neural Rewards for Coherent Text Generation
This addresses the challenge of maintaining coherence in text generation for natural language processing applications, representing an incremental improvement over existing methods.
The paper tackles the problem of generating long, coherent text by using discourse-aware neural rewards with reinforcement learning, resulting in a generator that produces more coherent and less repetitive text compared to models trained with cross-entropy or standard reinforcement learning rewards.
In this paper, we investigate the use of discourse-aware rewards with reinforcement learning to guide a model to generate long, coherent text. In particular, we propose to learn neural rewards to model cross-sentence ordering as a means to approximate desired discourse structure. Empirical results demonstrate that a generator trained with the learned reward produces more coherent and less repetitive text than models trained with cross-entropy or with reinforcement learning with commonly used scores as rewards.