CLJun 14, 2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

arXiv:2106.07207v130 citations
Originality Incremental advance
AI Analysis

This addresses the degeneration issue in text generation for users of large-scale language models, though it appears incremental as it modifies an existing training objective.

The paper tackles the problem of dull and repetitive text generation in neural language models by introducing ScaleGrad, a modification to the gradient of the loss function, which encourages the use of novel tokens and shows effectiveness in both open-ended and directed generation tasks.

Advanced large-scale neural language models have led to significant success in many language generation tasks. However, the most commonly used training objective, Maximum Likelihood Estimation (MLE), has been shown problematic, where the trained model prefers using dull and repetitive phrases. In this work, we introduce ScaleGrad, a modification straight to the gradient of the loss function, to remedy the degeneration issue of the standard MLE objective. By directly maneuvering the gradient information, ScaleGrad makes the model learn to use novel tokens. Empirical results show the effectiveness of our method not only in open-ended generation, but also in directed generation tasks. With the simplicity in architecture, our method can serve as a general training objective that is applicable to most of the neural text generation tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes