TextGAIL: Generative Adversarial Imitation Learning for Text Generation
This addresses the issue of unreliable guiding signals in text GANs for researchers and practitioners in natural language processing, representing an incremental improvement by integrating existing techniques like imitation learning and PPO.
The paper tackled the problem of inferior performance in text generation using Generative Adversarial Networks (GANs) compared to Maximum Likelihood Estimation (MLE) methods, proposing TextGAIL, a framework that uses pre-trained language models for reward guidance, and achieved better performance in quality and diversity than MLE baselines in experiments on unconditional and conditional text generation tasks.
Generative Adversarial Networks (GANs) for text generation have recently received many criticisms, as they perform worse than their MLE counterparts. We suspect previous text GANs' inferior performance is due to the lack of a reliable guiding signal in their discriminators. To address this problem, we propose a generative adversarial imitation learning framework for text generation that uses large pre-trained language models to provide more reliable reward guidance. Our approach uses contrastive discriminator, and proximal policy optimization (PPO) to stabilize and improve text generation performance. For evaluation, we conduct experiments on a diverse set of unconditional and conditional text generation tasks. Experimental results show that TextGAIL achieves better performance in terms of both quality and diversity than the MLE baseline. We also validate our intuition that TextGAIL's discriminator demonstrates the capability of providing reasonable rewards with an additional task.