Towards Content Transfer through Grounded Text Generation
It addresses a gap in content control for text generation, providing a new benchmark for researchers, but is incremental as it builds on existing neural generation work.
The paper tackles the problem of controlling neural text generation for content by introducing Content Transfer, a task to generate a next sentence in a document that fits context and is grounded in an external source, showing significant improvements on Wikipedia data.
Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.